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Zinc Finger Domains and Methods of Identifying Same 



TECHNICAL FIELD 

This invention relates to DNA-binding proteins such as transcription factors 

BACKGROUND 

Most genes are regulated at the transcriptional level by polypeptide 
transcription factors that bind to specific DNA sites within in the gene, typically in 
promoter or enhancer regions. These proteins activate or repress transcriptional 
initiation by RNA polymerase at the promoter, thereby regulating expression of the 
target gene. Many transcription factors, both activators and repressors, are modular in 
structure. Such modules can fold as structurally distinct domains and have specific 
functions, such as DNA binding, dimerization, or interaction with the transcriptional 
machinery. Effector domains such as activation domains or repression domains retain 
their function when transferred to DNA-binding domains of heterologous 
transcription factors (Brent and Ptashne, (1985) Cell 43:729-36; Dawson et al y (1995) 
Mol Cell Biol 15:6923-31). The three-dimensional structures of many DNA-binding 
domains, including zinc finger domains, homeodomains,~and helix-turn-rhelix domains, 
have been determined from NMR and X-ray crystallographic data. 

SUMMARY 

The invention provides a rapid and scalable cell-based method for identifying 
and constructing chimeric transcription factors. Such transcription factors can be used, 
for example, for altering the expression of endogenous genes in biomedical and 
bioengineering applications. The transcription factors are assayed in vivo, i.e., in 
intact, living cells. Also within the invention are novel nucleic acid binding domains 
that can be discovered, for example, by applying the method in a screen of genomic 
sequences. 

The invention features a method of identifying a peptide domain that 
recognizes a target site on a DNA. This method is sometimes referred to herein as the 
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- "domain selection method" or the "z;z vivo screening method." The method includes 
providing (1) cells containing a reporter construct and (2) a plurality of hybrid nucleic 
acids. The reporter, construct has a reporter gene operably linked to a promoter that 
has both a recruitment site and a target site. The reporter gene is expressed above a 
5 given level when a transcription factor recognizes (i.e., binds fo a degree above 

background) both the recruitment site and the target site of the promoter, but not when 
the transcription factor recognizes only the recruitment site of the promoter. Each 
hybrid nucleic acid of the plurality encodes a non-naturally occurring protein with the 
following elements: (i) a transcription activation domain, (ii) a DNA binding domain 

10 that recognizes the recruitment site, and (iii) a test zinc finger domain. The amino 
acid sequence of the test zinc finger domain varies among the members of the 
plurality of hybrid nucleic acids. The method further includes: contacting the 
plurality of nucleic acids with the cells under conditions that permit at least one of the 
plurality of nucleic acids to enter at least one of the cells; maintaining the cells under 

15 conditions permitting expression of the hybrid nucleic acids in the cells; identifying a 
cell that expresses the reporter gene above the given level as an indication that the cell 
contains a hybrid nucleic acid encoding a test zinc finger domain that recognizes the 
target site. 

The DNA binding domain, i.e., the domain that recognizes the recruitment site 
20 and does not vary among members of the plurality, can include, for example, one, two, 
three, or more zinc finger domains. The cells utilized in the method can be 
prokaryotic or eukaryotic. Exemplary eukaryotic cells are yeast cells, e.g. 
Saccharomyces cerevisiae, Schizosaccharomyces pombe, or, Pichia pasteuris; insect 
cells such as Sf9 cells; and mammalian cells such as fibroblasts or lymphocytes. 
25 The "given level" is the amount of expression observed when the transcription 

factor recognizes the recruitment site, but not the target site. The "given level" in 
some cases may be zero (at least within the limits of detection of the assay used). 

The method can include an additional step of amplifying a source nucleic acid 
encoding the test zinc finger domain from a nucleic acid, e.g., genomic DNA, an 
30 mRNA mixture, or a cDNA mixture, to produce an amplified fragment. The source 
nucleic acid can be amplified using an oligonucleotide primer. The oligonucleotide 
primer can be one of a set of degenerate oligonucleotides (e.g., a pool of specific 
oligonucleotides having different nucleic acid sequences, or a specific oligonucleotide 
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having a non-natural base such as inosine) that anneals to a nucleic acid encoding a 
conserved domain boundary. Alternatively, the primer can be a specific 
oligonucleotide. The amplified fragments are utilized to produce a hybrid nucleic 
acid for inclusion in the plurality of hybrid nucleic acids used in the aforementioned 
method. 

The method can further include the steps of (i) identifying a candidate zinc 
finger domain amino acid sequence in a sequence database; (ii) providing a candidate 
nucleic acid encoding the candidate zinc finger domain amino acid sequence; and (iii) 
utilizing the candidate nucleic acid to construct a hybrid nucleic acid for inclusion in 
the plurality of hybrid nucleic acids used in the aforementioned method. The database 
can include records for multiple amino acid sequences, e.g., known and/or predicted 
proteins, as well as multiple nucleic acid sequences such as cDNAs, ESTs, genomic 
DNA, or genomic DNA computationally processed to remove predicted introns. 

If desired, the method can be repeated to identify a second test zinc finger 
domain that recognizes a second target site, e.g., a site other than that recognized by 
the first test zinc finger domain. Subsequently, a nucleic acid can be constructed that 
encodes both the first and the second identified test zinc finger domains. The encoded 
hybrid protein would specifically recognize a target site that includes the target site of 
the first test zinc finger domain and the target site of the second test zinc finger 
domain. 

The invention also features a method of determining whether a test zinc finger 

domain recognizes a target site on a promoter. This method is sometimes referred to 

herein as the "site selection method." The method includes the steps of providing a 

reporter construct and a hybrid nucleic acid. The reporter gene is operably linked to a 

promoter that includes a recruitment site and a target site, and is expressed above a 

given level when a transcription factor recognizes both the recruitment site and the 

target site of the promoter, but not when the transcription factor recognizes only the 

recruitment site of the promoter. The hybrid nucleic acid encodes a non-naturally 

occurring protein with the following elements: (i) a transcription activation domain, 

(ii) a DNA binding domain that recognizes the recruitment site, and (iii) a test zinc 

finger domain. The method further includes: contacting the reporter construct with a 

cell under conditions that permit the reporter construct to enter the cell; prior to, after, 

or concurrent with the aforementioned step, contacting the hybrid nucleic acid with 

-3- 
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the cell under conditions that permit the hybrid nucleic acid to enter the cell; 
maintaining the cell under conditions permitting expression of the hybrid nucleic acid 
in the cell; and detecting reporter gene expression in the cell. A level of reporter gene 
expression greater than the given level is an indication that the test zinc finger domain 

5 recognizes the target site. 

The reporter construct and the hybrid nucleic acid can be contained in separate 
plasmids. The two plasmids can be introduced into the cell simultaneously or 
consecutively. One or both plasmids can contain selectable markers. The reporter 
construct and the hybrid nucleic acid can also be contained on the same plasmid, in 

10 which case only one contacting step is required to introduce both nucleic acids into a 
cell. In yet another implementation, one or both of the nucleic acids are stably 
integrated into a genome of a cell. For this method, as for any in vivo method 
described herein, the transcriptional activation domain can be replaced with a 
transcriptional repression domain, and a cell is identified in which the level of reporter 

15 gene expression is decreased to a level below the given level. 

Another method of the invention facilitates the rapid determination of a 
binding preference of a test zinc finger domain by fusing two cells. The method 
includes: providing a first cell containing the reporter gene; providing a second cell 
containing the hybrid nucleic acid; fusing the first and second cells to form a fused 

20 cell; maintaining the fused cells under conditions permitting expression of the hybrid 
nucleic acids in the fused cell; and detecting reporter gene expression in the fused cell, 
wherein a level of reporter gene expression greater than the given level is an 
indication that the test zinc finger domain recognizes the target site. For example, the 
first and second cells can be tissue culture cells or fungal cells. An exemplary 

25 implementation of the method utilizes S. cerevisiae cells. The first cell has a first 

mating type, e.g., MATa; the second cell has a second mating type different from the 
first, e.g., MATa. The two cells are contacted with one another, and yeast mating 
produces a single cell (e.g., MATa/ct) with a nucleus containing the genomes of both 
the first and second cells. The method can including providing multiple first cells, all 

30 of the same first mating type where each first cell has a reporter construct with a 
different target site. Multiple second cells, all of the same second mating type and 
each having a different test zinc finger domain, are also provided. A matrix is 
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generated of multiple pair-wise matings, e.g., all possible pair- wise matings. The 
method is applied to determine the binding preference of multiple test zinc finger 
domains for multiple binding sites, e.g., a complete set of possible target sites. 

The invention also provides a method of assaying a binding preference-of a 
5 test zinc finger domain. The method includes providing (1) cells, essentially all of 
which contain a hybrid nucleic acid, and (2) a plurality of reporter constructs. Each 
reporter construct of the plurality has a reporter gene operably linked to a promoter 
with a recruitment site and a target site. The reporter gene is expressed above a given 
level when a transcription factor recognizes both the recruitment site and the target 

1 0 site of the promoter, but not when the transcription factor binds only the recruitment 
site of the promoter. The second target site varies among the members of the plurality 
of reporter constructs. The hybrid nucleic acid encodes a hybrid protein with the 
following elements: (i) a transcription activation domain, (ii) a DNA binding domain 
that recognizes the recruitment site, and (iii) a test zinc finger domain. The method 

15 further includes: contacting the plurality of reporter constructs with the cells under 
conditions that permit at least one of the plurality of reporter constructs to enter at 
least one of the cells; maintaining the cells under conditions permitting expression of 
the nucleic acids in the cells; identifying a cell that contains a reporter construct in the 
cell and that expresses the reporter construct above the given level as an indication 

20 that the reporter construct in the cell has a target site recognized by the zinc finger 
domain. 

A plurality of cells, each with a different target site, can be identified by the 
above method if the test zinc finger domain has a binding preference for more than 
one target site. The method can further include identifying the cell that exhibits the 

25 highest level of reporter gene expression. Alternatively, a threshold level of reporter 
gene expression is determined, e.g., an increase in reporter gene expression of 2, 4, 8, 
20, 50, 100, 1000 fold or greater, and all cells exhibiting reporter gene expression 
above the threshold are selected. 

The target binding site, for example, can be between two and six nucleotides 

30 long. The plurality of reporter constructs can include every possible combination of 
A, T, G, and C nucleotides at two, three, or four or more positions of the target 
binding site. 
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In another aspect, the invention features a method of identifying a plurality of 
zinc finger domains. The method includes: carrying out the domain selection method 
to identify a first test zinc finger domain and carrying out the domain selection 
method again to identify a second test zinc finger domain that recognizes a target site 
different from a target site of the first test zinc finger domain. 'Also featured is a 
method of generating a nucleic acid encoding a chimeric zinc finger protein, the 
method includes carrying out the domain selection method twice to identify a first and 
second test zinc finger domain and constructing a nucleic acid encoding a polypeptide 
including the first and second test zinc finger domains. The nucleic acid can encode a 
hybrid protein that includes the two domains that specifically recognize a site that 
includes two subsites. The subsites are the target site of the first test zinc finger 
domain and target site of the second test zinc finger domain. 

In still another aspect, the invention features a method of identifying a DNA 
sequence recognized by zinc finger domains. The method includes: carrying out the 
site selection method to identify a first binding preference for a first test zinc finger 
domain, and carrying out the site selection method again to identify a second binding 
preference for a second test zinc finger domain. A nucleic acid can be constructed 
which encodes both the first and the second identified test zinc finger domains. The 
nucleic acid can encode a hybrid protein including the two domains that specifically 
recognizes a site that includes the target site of the first test zinc finger domain and 
target site of the second test zinc finger domain. 

The invention also features a method of identifying a peptide domain that 

recognizes a target site on a DNA. The method includes providing (1) cells 

containing a reporter construct and (2) a plurality of hybrid nucleic acids. The 

reporter construct has a reporter gene operably linked to a promoter that has both a 

recruitment site and a target site. The reporter gene is expressed below a given level 

when a transcription factor recognizes (i.e., binds to a degree above background) both 

the recruitment site and the target site of the promoter, but not when the transcription 

factor recognizes only the recruitment site of the promoter. Each hybrid nucleic acid 

of the plurality encodes a non-naturally occurring protein with the following 

elements: (i) a transcription repression domain, (ii) a DNA binding domain that 

recognizes the recruitment site, and (iii) a test zinc finger domain. The amino acid 

sequence of the test zinc finger domain varies among the members of the plurality of 
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hybrid nucleic acids. The method further includes: contacting the plurality of nucleic 
acids with the cells under conditions that permit at least one of the plurality of nucleic 
acids to enter at least one of the cells; maintaining the cells under conditions 
permitting expression of the hybrid nucleic acids in the cells; identifying a cell-that 
5 expresses the reporter gene below the given level as an indication that the cell 

contains a hybrid nucleic acid encoding a test zinc finger domain that recognizes the 
target site. Additional embodiments of this method are as for the similar method 
utilizing a transcription activation domain. Likewise, any other selection method 
described herein can be performed using a transcriptional repression domain in place 
10 of a transcriptional activation domain. 

In another aspect, the invention features certain purified polypeptides and 
isolated nucleic acids. Purified polypeptide of the invention include polypeptide 
having the amino acid sequence: 

X,-X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Cys-X-Ser-Asn-X 5 -X-Arg-His-X 3 . s -His (SEQ ED • 
15 NO:68), 

X a -X-Cys-X 2 . 5 -Cys-X3-X a -X-His-X-Ser-Asn-X b -X-Lys-His-X 3 . s -His(SEQID 
NO:69), 

X J -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Ser-X-Ser-Asn-X b -X-Arg-His-X 3 . s -His(SEQID 
NO:70), 

20 X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Gm-X-Ser-Thr-X b -X-Val-His-X 3 . 5 -His (SEQ ID 

NO:71), • " 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Val-X-Ser-X c -X b -X-Arg-His-X 3 . 5 -His(SEQID 
NO:72), • - ■ . 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Gln-X-Ser-His-X b -X-Arg-His-X 3 . 5 -His (SEQ ID 
25 NO:73), 

X a -X-Cys-X 2 . s -Cys-X 3 -X a -X-Gln-X-Ser-Asn-X b -X-Val-ffis-X 3 . s -His (SEQ ID 
NO:74), 

X a -X-Cys-X 2 . J -Cys-X 3 -X a -X-Gln-X-Ser-X c -X b -X-Arg-His-X 3 . 5 -His (SEQ ID 
NO:75), 

30 X a -X-Cys-X 2 . 5 -Cys-X 3 -X ;i -X-Gln-X-Ala-His-X b -X-Arg-His-X3. 5 -His 
(SEQ ID NO: 150), 

X 1 -X-Cys-X,. 5 -Cys-X 3 -X ll -X-Gb-X-Phe-Asn-X b -X-Arg-His-Xj.5-His 
(SEQIDNO-.151), 
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X J -X-Cys-X J . s -Cys-X 3 -X a -X-Gln-X-Ser-His-X b -X-Thr-His-X3. 5 -His 
(SEQ ID NO: 152), 

X 1 -X-Cys-X 2 . s -Cys-X 3 -X a -X-Gln-X-Ser-His-X b -X-Val-His-X 3 . s -His 
(SEQIDN0:153), i 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Gln-X-Ser-Asn-X b -X-Ile-His-X J . s -His 
(SEQIDNO:154), 

X a -X-Cys-X 2 . 5 -Cys-X3-X a -X-Gln-X-Ser-Asn-X b -X-Arg-His-X 3 . 5 -His 
(SEQ ED NO: 155), 

X^X-Cys-Xj.s-Cys-Xj^-X-Gln-X-Thr-His-Xb-X-Gln-His-Xj.j-His 
(SEQIDNO:156), 

Cys-X^.-Cys^-X^X-Gln-X-Thr-His-Xb-X-Arg-His-Xj^-His 
(SEQ ID NO: 157), 

X a -X-Cys-X 2 . s -Cys-X3-X a -X-Arg-X-Asp-Lys-X b -X-Ile-His-X 3 . 5 -His 
(SEQIDNO:158), 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Arg-X-Ser-Asn-X b -X-Arg-His-X 3 . 5 -His 
(SEQ ID NO: 159), 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Gln-X-Gly-Asn-X b -X-Arg-His-X 3 . 5 -His 
(SEQIDN0.161), 

X > -X-Cys-X 2 . J -Cys-X 3 -X a -X-Arg-X-Asp-Glu-X b -X-Arg-His-X 3 . j -His 
(SEQIDNO:162), 

X a -X-Cys-X 2 ; 5 -Cys-X 3 -X a -X-Arg-X-Asp-His-X b -X-Arg-His-X 3 . 5 -His 
(SEQIDNO:163), 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Arg-X-Asp-His-X b -X-Thr-Hrs-X 3 . 5 -His 
(SEQ ID NO: 164), 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Arg-X-Asp-Lys-X b -X-Arg-His-X 3 . 5 -His 
(SEQ ID NO: 165), 

X a -X-Cys-X 2 . s -Cys-X 3 -X a -X-Arg-X-Ser-His-X b -X-Arg-His-X3. 5 -His 
(SEQ ID NO: 166), or 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Arg-X-Thr-Asn-X b -X-Arg-His-X3. r His 
(SEQ ID NO: 167), 

wherein X a is phenylalanine or tyrosine, X b is a hydrophobic residue, 

and X c is serine or threonine. Nucleic acids of the invention include nucleic acids 

encoding the aforementioned polypeptides. 

-8- 
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In addition, purified polypeptides of the invention can have amino acids 
sequence 50% 60%, 70%, 80%, 90%, 93%, 95%, 96%, 98%, 99%, or 100% identical 
to SEQ ID NOs: 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 
59,61,63, 65,67, 103, 105, 107, 111, 113, 115, 117, 119, 121, 123, 125, 127,-129, 
5 131, 133, 135, 137, 141, 143, 145, 147, 149, or 151. The polypeptides can be 

identical to SEQ ID NOs: 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 
55, 57, 59, 61, 63, 65, 67, 103, 105, 107, 111, 113, 115, 117, 119, 121, 123, 125, 127, 
129, 131, 133, 135, 137, 141, 143, 145, 147, 149, or 151 at the amino acid positions 
corresponding to the nucleic acid contacting residues of the polypeptide. 

10 Alternatively, the polypeptides differ from SEQ ID NOs: 23, 25, 27, 29, 31, 33, 35, 37, 
39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 103, 105, 107, 111, 113, 115, 
117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 141, 143, 145, 147, 149, or 151 
at at least one of the residues corresponding to the nucleic acid contacting residues of 
the polypeptide. The purified polypeptides can also include one or more of the 

15 following: a heterologous DNA binding domain, a nuclear localization signal, a small 
molecular binding domain (e.g., a steroid binding domain), an epitope tag or 
purification handle, a catalytic domain (e.g., a nucleic acid modifying domain, a 
nucleic acid cleavage domain, or a DNA repair catalytic domain) and/or a 
transcriptional function domain (e.g., an activation domain, a repression domain, and 

20 so forth). The invention also includes isolated nucleic acid sequences encoding the 

aforementioned polypeptides, and isolated nucleic acid sequences that hybridize under 
high stringency conditions to a single stranded probe, the sequence of the probe 
consisting of SEQ ID NO:22, 24, 26, 28, 30, 32, 34, 36, 38, 40; 42, 44, 46, 48, 50, 52, 
54, 56, 58, 60, 62, 64, 66, 102, 104, 106, 110, 112, 114, 116, 118, 120, 122, 124, 126, 

25 128, 130, 132, 134, 136, 140, 142, 144, 146, 148, or 150 or the complements thereof 
The invention further includes a method of expressing in a cell a polypeptide of the 
invention fused to a heterologous nucleic acid binding domain. The method includes 
introducing into a cell a nucleic acid encoding the aforementioned fusion protein. A 
nucleic acid of the invention can be operably regulated by a heterologous nucleic acid 

30 sequence, e.g., an inducible promoter (e.g., a steroid hormone regulated promoter, a 
small-molecule regulated promoter, or an engineered inducible system such as the 
tetracycline Tet-On and Tet-Off systems). 
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The term "base contacting positions" refers to the four amino acid positions of 
zinc finger domains that structurally correspond to amino acids arginine 73, aspartic 
acid 75, glutamic acid 76, and arginine 79 of SEQ ID NO:21. These positions are 
also referred to as positions -1, 2, 3, and 6. To identify positions ifi a query sequence 
5 that correspond to the base contacting positions, the query sequence is aligned to the 
zinc finger domain of interest such that the cysteine and histidine residues of the 
query sequence are aligned with those of finger 3 of Zif268. The ClustalW WWW 
Service at the European Bioinformatics Institute (http://www2.ebi.ac.uk/clustalw; 
Thompson et al (1994) Nucleic Acids Res. 22:4673-4680) provides one convenient 

10 method of aligning sequences. 

The term "heterologous" refers to a polypeptide that is introduced into a context 
by artifice, and that does not occur naturally in the same context. In distinction from 
an endogenous entity, a heterologous polypeptide can have a polypeptide sequence 
flanking it on at least one side that does not flank it in any naturally occurring 

15 polypeptide. The term "hybrid" refers to a polypeptide which comprises amino acid 
sequences derived from either (i) at least two different naturally occurring sequences; 
(ii) at least an artificial sequence (i.e., a sequence that does not occur naturally) and a 
naturally occurring sequence; or (iii) at least two different artificial sequences. 
Examples of artificial sequences include mutants of a naturally occurring sequence 

20 and de novo designed sequences. 

As used herein, the term "hybridizes under stringent conditions" refers to 
conditions for hybridization in 6X sodium chloride/sodium citrate (SSC) at 45°C, 
followed by two washes in 0.2 X SSC, 0. 1% SDS at 65°C. 

The term "binding preference" refers to the discriminative property of a 

25 polypeptide for selecting one nucleic acid binding site relative to another. For 
example, when the polypeptide is limiting in quantity relative to the nucleic acid 
binding sites, a greater amount of the polypeptide will bind the preferred site relative 
to the other site in an in vivo or in vitro assay described herein. 

As used herein, the term "recognizes" refers to the ability of a polypeptide to 

30 discriminate between one nucleic acid binding site and a second competing site such 
that, e.g., in the context of an assay described herein, the polypeptide remains bound 
to the first site in the presence of an excess of the second site. The polypeptide may 
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not have sufficient affinity for the first site to bind alone, but may be assayed when 
fused as in a hybrid polypeptide of the invention to another nucleic acid binding 
domain that binds a nearby recruitment site. 

As used herein, "degenerate oligonucleotides" refers to both (a) a population 
of different oligonucleotides, and (b) a single species of oligonucleotide that can 
anneal to more than one sequence, e.g., an oligonucleotide with an unnatural 
nucleotide such as inosine. 

The present invention provides numerous benefits. The ability to select a 
DNA binding domain that recognizes a particular sequence permits the design of 
novel polypeptides that bind to specific site on a DNA. Thus, the invention facilitates 
the customized generation of novel polypeptides that can regulate the expression of a 
selected target, e.g., a gene required by a pathogen can be repressed, a gene required 
for cancerous growth can be repressed, a gene poorly expressed or encoding a 
mutated protein can be activated and overexpressed, and so forth. 

The use of zinc finger domains is particularly advantageous. First, the zinc 
finger motif recognizes very diverse DNA sequences. Second, the structure of 
naturally occurring zinc finger proteins is modular. For example, the zinc finger 
protein Zif268, also called "Egr-1," is composed of a tandem array of three zinc finger 
domains. Fig. 1 is the x-ray crystallographic structure of zinc finger protein Zif268, 
consisting of three fingers complexed with DNA (Pavletich and Pabo, (1991) Science 
252:809-817). Each finger independently contacts 3-4 basepairs of the DNA 
recognition site. Hence, the subsite contacted by each finger can be regarded as an 
independent molecular recognition event. High affinity binding is achieved by the 
cooperative effect of having multiple zinc finger modules in the same polypeptide 
chain. 

The use of an in vivo selection step enables one to identify directly those 
polypeptides that bind to a specific site on a DNA in the intracellular milieu. The 
factors associated with recognition in a cell, particularly a eukaryotic cell, can be 
vastly different from the factors present during an in vitro selection scenario. For 
example, in a eukaryotic nucleus, a polypeptide must compete with the myriad other 
nuclear proteins for a specific nucleic acid binding site. A nucleosome or another 
chromatin protein can occupy, occlude, or compete for the binding site. Even if 
unbound, the conformation of a nucleic acid in the cell is subject to bending, 
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supercoiling, torsion, and unwinding. Conversely, the polypeptide itself is exposed to 
proteases and chaperones, among other factors. Moreover, the polypeptide is 
confronted with an entire genome of possible binding sites, and hence must be 
endowed with a high specificity for the desired site in order to survive the selection 
process. In contrast to in vivo selection, an in vitro selection oan select for the highest 
affinity binder rather than, the highest specificity binder. 

The use of a reporter gene to indicate the binding ability of an expressed 
polypeptide chimera not only is efficient and simple, but also obviates the need to 
develop a complex interaction code that accounts for the energetics of the protein- 
nucleic acid interface and the immense number of peripheral factors, such as 
surrounding residues and nucleotides that also affect the binding interface. (Segal et al 
(1999) Proc. Natl. Acad. Set USA 96:2758-2763). 

The present invention avails itself of all the zinc finger domains present in the 
human genome, or any other genome. This diverse sampling of sequence space 
occupied by the zinc finger domain structural fold may have the additional advantages * 
inherent in, eons of natural selection. Moreover, by utilizing domains from the host 
species, a DNA binding protein engineered for a gene therapy application by the 
methods described herein has a reduced likelihood of being regarded as foreign by the 
host immune response. 

The details of one or more embodiments of the invention are set forth in the 
accompanying drawings and the description below. Other features, objects, and 
advantages of the invention will be apparent from the description and drawings, and 
from the claims. " " • . 

DESCRIPTION OF DRAWINGS 

Fig. 1 is a depiction of the three dimensional structure of the Zif268 zinc 
finger protein that consists of three finger domains and binds the DNA sequence, 5'- 
GCG TGG GCG T-3\ The black circles represent the location of the zinc ion. 

Fig. 2 is an illustration of the hydrogen-bonding interactions between amino 
acid residues of Zif268 and DNA bases. Amino acid residues at positions -1, 2, 3, and 
6 along the a-helix interact with the bases at specific positions. The bold lines 
represent ideal hydrogen bonding, while the dotted lines represent potential hydrogen 
bonding. 

- 12- - 
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Fig. 3 is a recognition code table that summarizes the interactions between 
DNA bases and amino acid residues at positions -1, 2, 3, and 6 along the a-helix of a 
zinc finger domain. 

Fig. 4 is a depiction of Jhe positions of amino acid residues and their 
corresponding 3 base triplets. The bold lines represent the main interactions observed, 
while the dotted line represents an auxiliary interaction. 

Fig. 5 is a diagram illustrating the principles of the in vivo selection system 
disclosed herein. Of the various zinc finger mutants, zinc finger domain A recognizes 
the target sequence (designated XXX X) and activates the transcription of HIS3 
reporter gene. As a result, yeast colonies grow on a medium lacking histidine. In 
contrast, zinc finger domain B does not recognize the target sequence and thus the 
reporter gene remains repressed. As a result, no colonies grow on a medium lacking 
histidine. AD represents the transcriptional activation domain. 

Fig. 6 is a list of 10-bp sequences found in long terminal repeats (LTR) of - 
HIV-1 and in the promoter region of CCR5, a human gene encoding a coreceptor for 
HIV-1 (SEQ ID NOs:l-5, respectively). The underlined portions represent 4-bp 
target sequences used in the present selection. 

Fig. 7 is a depiction of the base sequences of the binding sites linked to the 
reporter gene (SEQ ID NOs:6-17, respectively). Each binding site consists of a 
tandem array of 4 composite binding sequences. Each composite binding sequence 
was constructed by connecting truncated binding sequence 5'-GG GCG-3' recognized 
by finger 1 and finger 2 of Zif268 to 4-bp target sequences. 

Fig 8 is a diagram of pPCFMS-Zif, a plasmid that can be used for the 
construction of a library of hybrid plasmids (SEQ ID NOs:18 and 19). 

Fig 9 is a representation of the base sequence for the gene coding for Zif268 
zinc finger protein inserted into pPCFMS-Zif and the corresponding translated amino 
acid sequences (SEQ ID NOs:20 and 21, respectively). Sites recognized by restriction 
enzymes are underlined. 

Fig. 10 is a photograph of a culture plate having yeast cells obtained from 
retransformation and cross transformation using zinc finger proteins selected by the in 
vivo selection system. 

Fig. 1 1 is a list of some DNA sequences of zinc finger domains selected by the 
in vivo system from a zinc finger library derived from the human genome and amino 
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acid sequences encoded by the DNA sequences (SEQ ID NOs:22-33). The DNA 
sequences corresponding to the degenerate PCR primers used to amplify DNA 
segments encoding zinc finger domains from the human genome are underlined. The 
four potential base-contacting positions are indicated, and the amino acid residues are 
5 shown in bold. The two Cys residues and two His residues that are expected to 
coordinate with the zinc ion are shown in italics. 

DETAILED DESCRIPTION 

The invention features a novel screening method for determining the nucleic 
acid binding preferences of test zinc finger domains. The method is easily adapted to 

10 a variety of DNA binding domains, a variety of sources for these domains, and a 

number of library designs, reporter genes, and selection and screening systems. The 
screening method can be implemented as a high-throughput platform. Information 
obtained from the screening method is readily applied to a method of designing 
artificial nucleic acid binding proteins. The design method appropriates the binding 

15 preferences of test zinc finger domains to guide the modular assembly of a chimeric 
nucleic acid binding protein. A designed protein can be further optimized or varied 
with the screening method. 

DNA binding domains 

The invention utilizes collections of nucleic acid .binding domains with 
20 differing binding specificities. A variety of protein structures are known to bind 
nucleic acids with high affinity and high specificity. These structures are used 
repeatedly in a myriad of different proteins to specifically control nucleic acid 
function (for reviews of structural motifs which recognize double stranded DNA, see, 
e.g., Pabo and Sauer (1992) Annu. Rev Biochem. 61:1053-95; Patikoglou andBurley 
25 (1997) Annu. Rev. Biophys. Biomol Struct 26:289-325; Nelson (1995) Curr Opin 

Genet Dev. 5: 180-9). A few non-limiting examples of nucleic acid binding domains 
include: 

Zinc fingers. Zinc fingers are small polypeptide domains of approximately 
30 amino acid residues in which there are four amino acids, either cysteine or 
30 histidine, appropriately spaced such that they can coordinate a zinc ion (Fig. 1 ; for 
reviews, see, e.g., Klug and Rhodes, (1987) Trends Biochem. ScL 12:464-469(1 987); 
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Evans and Hollenberg, (1988) Cell 52: 1-3; Payre and Vincent, (1988) FEBS Lett. 
234:245-250; Miller et ai, (1985) EMBOJ. 4:1609-1614; Berg, (1988) Proc. Natl. 
Acad. Sci. U.S.A. 85:99-102; Rosenfeld and Margalit* (1993) J. Biomol. Struct. Dyn. 
11:557-570). Hence, zinc finger domains can be categorized according to the -identity 
of the residues that coordinate the zinc ion, e.g., as the Cys 2 -Hls, class, the Cys^-Cys, 
class, the Cys^-CysHis class, and so forth. The zinc coordinating residues of Cys,- 
His, zinc fingers are typically spaced as follows: X a -X-C-X 2 . 5 -C-X 3 -X a -X 5 -\|/-X 2 -H- 
X 3 _ 5 -H, where y (psi) is a hydrophobic residue (Wolfe et aL t (1999) Annu. Rev. 
Biophys. Biomol. Struct. 3:183-212) (SEQ ID NO:76), wherein "X" represents any 
amino acid, wherein X a is phenylalanine or tyrosine, the subscript indicates the 
number of amino acids, and two subscripts indicate a typical range of intervening 
amino acids. Typically, the intervening amino acids fold to form an anti-parallel p- 
sheet that packs against an a-helix, although the anti-parallel p-sheets can be short, 
non-ideal, or non-existent. The fold positions the zinc-coordinating side chains so 
they are in a tetrahedral conformation appropriate for coordinating the zinc ion. The 
base contacting residues are at the N-terminus of the finger and in the preceding loop 
region (Fig. 2). A zinc finger DNA-binding protein normally consists of a tandem 
array of three or more zinc finger domains. 

The zinc finger domain (or "ZFD") is one of the most common eukaryotic 
DNA-binding motifs, found in species from yeast to higher plants and to humans. By 
one estimate, there are at least several thousand zinc finger domains in the human 
genome alone. Zinc finger domains can be isolated from zinc finger proteins. Non- 
limiting examples of zinc finger proteins include CF2-II, Kruppel, WT1, basonuclin, 
BCL-6/LAZ-3, erythroid Kruppel-like transcription factor, transcription factors Spl, 
Sp2, Sp3, and Sp4, transcriptional repressor YY1, EGRl/Krox24, EGR2/Krox20, 
EGR3/Pilot, EGR4/AT133, Evi-1, GLI1, GLI2, GLI3, HIV-EP1/ZNF40, HIV-EP2, 
KRl,ZfX 5 ZfY, and ZNF7. 

Computational methods described below can be used to identify all zinc 
finger domains encoded in a sequenced genome or in a nucleic acid database. Any 
such zinc finger domain can be utilized. In addition, artificial zinc finger domains 
have been designed, e.g., using computational methods (e.g., Dahiyat and Mayo, 
(1997) Science 278:82-7). The zinc finger of Dahiyat and Mayo adopts the zinc 
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finger fold, but does not contain a zinc ion in its core. Thus, it is a zinc finger by 
structural similarity of its polypeptide backbone to the fold of naturally occurring zinc 
fingers, rather than by functional ability to coordinate a zinc ion. 

Homeodomains. Homeodomains are simple eukaryotic domains that-eonsist 

5 of a N-terminal arm that contacts the DNA minor groove, foll6wed by three a-helices 
that contact the major groove (for a review, see, e.g., Laughon, (1991) Biochemistry 
30: 1 1357-67). The third a-helix is positioned in the major groove and contains 
critical DNA-contacting side chains. Homeodomains have a characteristic highly- 
conserved motif present at the turn leading into the third a-helix. The motif includes 

10 an invariant tryptophan that packs into the hydrophobic core of the domain. This 
motif is represented in the Prosite database (see http://www.expasy.ch/) as 
PDOC00027 ([L/I/V/M/F/Y/G]-[A^ 
[L/W/M]-X(4)-[L/W]- [R/K/N/Q/E/S/T/A/I/Y]^^ 

X-[N/D/Q/T/A/H]-X(5)- [R/Km/A/I/M/W]; SEQ ID NO:77). Homeodomains are 

15 commonly found in transcription factors that determine cell identity and provide 
positional information during organismal development. Such classical 
homeodomains can be found in the genome in clusters such that the order of the 
homeodomains in the cluster approximately corresponds to their expression pattern 
along a body axis. Homeodomains can be identified by alignment with a 

20 homeodomain, e.g., Hox-1 , or by alignment with a homeodomain profile or a 

homeodomain hidden Markov Model (HMM; see below), e.g., PF00046 of the Pfam 
database or "HOX" of the SMART database (http://smart.embl-heidelberg.de/), or by 
the Prosite motif PDOC00027 as mentioned above. 

Helix-turn-helix proteins. This DNA binding motif is common among many 

25 prokaryotic transcription factors. There are many subfamilies, e.g., the LacI family, 
the AraC family, to name but a few. The two helices in the name refer to a first 
a-helix that packs against and positions a second a-helix in the major groove of 
DNA. These domains can be identified by alignment with a HMM, e.g., HTH_ARAC, 
HTH_ARSR, HTH_ASNC, HTHCRP, HTH_DEOR, HTHjDTXR, HTH_GNTR, 

30 HTHJCLR, HTH_LACI, HTHJLUXR, HTHJMARR, HTHJMERR, and 
HTH_XRE profiles available in the SMART database (http://smart.embl- 
heidelberg.de/). 
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Helix-loop-helix proteins. This DNA binding domain is commonly found 
among homo- and hetero-dimeric transcription factors, e.g., MyoD, fos, jun, Ell, and 
myogenin. The domain consists of a dimer, each monomer contributing two 
a-helices and intervening loop. The domain can be identified by alignment with a 
HMM, e.g., the "HLH" profile available in the SMART database (http://smart.embl- 
heidelberg.de/). Although helix-loop-helix proteins are typically dimeric, monomeric 
versions can be constructed by engineering a polypeptide linker between the two 
subunits such that a single open reading frame encodes both the two subunits and the 
linker. 



Identification of DNA-binding domains 

A variety of methods can be used to identify structural domains. 

Computational Methods. The amino acid sequence of a DNA binding 
domain isolated by a method described herein can be compared to a database of 
known sequences, e.g., an annotated database of protein sequences or an annotated 
database which includes entries for nucleic acid binding domains. In another 
implementation, databases of uncharacterized sequences, e.g., unannotated genomic, 
EST or full-length cDNA sequence; of characterized sequences, e.g., SwissProt or 
PDB; and of domains^e.g., Pfam, ProDom (http://www.tooulouse.inra.fr/), and 
SMART (Simple Modular Architecture Research Tool, http://smart.embl- 
heidelberg.de/) can provide a source of nucleic acid binding domain sequences. 
Nucleic acid sequence databases can be translated in all six reading frames for the 
purpose of comparison to a query amino acid sequence. Nucleic acid sequences that 
are flagged as encoding candidate nucleic acid binding domains can be amplified 
from an appropriate nucleic acid source, e.g., genomic DNA or cellular RNA. Such 
nucleic acid sequences can be cloned into an expression vector. The procedures for 
computer-based domain identification can be interfaced with an oligonucleotide 
synthesizer and robotic systems to produce nucleic acids encoding the domains in a 
high-throughput platform. Cloned nucleic acids encoding the candidate domains can 
also be stored in a host expression vector and shuttled easily into an expression vector, 
e.g., into a translational fusion vector with Zif268 fingers 1 and 2, either by restriction 
enzyme mediated subcloning or by site-specific, recombinase mediated subcloning 
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(see U.S. Patent No. 5,888,732). The high-throughput platform can be used to 
generate multiple microtitre plates containing nucleic acids encoding different 
candidate nucleic acid binding domains. 

Detailed methods for the identification of domains from a starting sequence or 
a profile are well known in the art. See, for example, Prosite (Hofmann et aL, (1999) 
Nucleic Acids Res. 27:215-219), FASTA, BLAST (Altschul et aL, (1990) 1 Mol Biol 
215:403-10.), etc. A simple string search can be done to find amino acid sequences 
with identity to a query sequence or a query profile, e.g., using Perl 
(http://bio.perl.org/) to scan text files. Sequences so identified can be about 30%, 
40%, 50%, 60%, 70%, 80%, 90%, or greater identical to an initial input sequence. 

Domains similar to a query domain can be identified from a public database, 
e.g., using the XBLAST programs (version 2.0) of Altschul et aL, (1990) 1 Mol. Biol 
2 15:403-10. For example, BLAST protein searches can be performed with the 
XBLAST parameters as follows: score = 50, wordlength = 3. Gaps can be introduced 
into the query or searched sequence as described in Altschul et aL, (1997) Nucleic 
Acids Res. 25(17):3389-3402. Default parameters for XBLAST and Gapped BLAST 
programs are available at http://www.ncbi.nlm.nih.gov. 

The Prosite profiles PS00028 and PS50157 can be used to identify zinc finger 
domains. In a SWISSPROT release of 80,000 protein sequences, these profiles 
detected 3189 and 2316 zinc finger domains, respectively. Profiles can be constructed 
from a multiple sequence alignment of related proteins by a variety of different 
techniques. Gribskov and co-workers (Gribskov et aL, (1990) Meth. Enzymol 
183:146-1 59) utilized a symbol comparison table to convert a multiple sequence 
alignment supplied with residue frequency distributions into weights for each position. 
See, for example, the PROSITE database and the work of Luethy et aL, (1994) 
Protein ScL 3:139-1465. 

Hidden Markov Models (HMM's) representing a DNA binding domain of 

interest can be generated or obtained from a database of such models, e.g., the Pfam 

database, release 2.1. A database can be searched, e.g., using the default parameters, 

with the HMM in order to find additional domains (see, e.g., 

http://www.sanger.ac.uk/Soflware/Pfam/HMM_search for default parameters). 

Alternatively, the user can optimize the parameters. A threshold score can be selected 

to filter the database of sequences such that sequences that score above the threshold 
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are displayed as candidate domains. A description of the Pfam database can be found 
in Sonhammer et aL, (1997) Proteins 28(3):405-420, and a detailed description of 
HMMs can be found, for example, in Gribskov et aL, (1990) Meth. Enzymol 
183:146-159; Gribskov etal., (1987) Proa Natl. Acad. Sci USA 84:4355-4358; 
5 Krogh et aL, (1994) J. Mol. BioL 235:1501-1531; and Stultz €t aL, (1993) Protein Sci. 
2:305-314. 

The SMART database of HMM's (Simple Modular Architecture Research 
Tool, http://smart.embl-heidelberg.de/; Schultz et aL, (1998) Proa Natl. Acad. Sci. 
USA 95:5857 and Schultz et al., (2000) NucL Acids Res 28:231) provides a catalog of 

10 zinc finger domains (ZnF_C2H2; ZnF_C2C2; ZnF_C2HC; ZnF_C3Hl; ZnF_C4; 
ZnF_CHCC; ZnF GATA; and ZnF_NFX) identified by profiling with the hidden 
Markov models of the HMMer2 search program (Durbin et aL, (1998) Biological 
sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge 
University Press.; http://hxnmer.wustl.edii/). 

15 Hybridization-based Methods. A collection of nucleic acids encoding 

various forms of a DNA binding domain can be analyzed to profile sequences 
encoding conserved amino- and carboxy-terminal boundary sequences. Degenerate 
oligonucleotides can be designed to hybridize to sequences encoding such conserved 
boundary sequences. Moreover, the efficacy of such degenerate oligonucleotides can 

20 be estimated by comparing their composition to the frequency of possible annealing 

sites in known genomic sequences. Multiple rounds of design can be used to optimize 
the degenerate oligonucleotides. For example, comparison of known Cys 2 -His 2 zinc 
fingers revealed a common sequence in the linker region between adjacent fingers in 
natural sequence (Agata et aL, (1998) Gene 213:55-64). Such degenerate 

25 oligonucleotides are used to amplify a plurality of DNA binding domains. The 

amplified domains are inserted as test zinc finger domains into the hybrid nucleic acid, 
and subsequently assayed for binding to a target site by the methods described herein. 



Libra r y Desig n 

30 The method permits the screening of a collection of nucleic acids encoding 

DNA binding domains (for example, in the form of a plasmid, phagemid, or phage 
library) for functional nucleic acid binding properties. The collection can encode a 
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diverse group of DNA binding domains, even domains of different structural folds. In 
one instance, the collection encodes domains of a single structural fold such as a zinc 
finger domain. Although the following methods are described in the context of zinc 
finger domains, one skilled in the art would be able to adapt them to other types of 
5 nucleic acid binding domains. ' 

Mutated Domains. In still another instance, the collection is composed of 
nucleic acids encoding a structural domain that is assembled from a degenerate 
patterned library. For example, in the instance of zinc fingers, an alignment of known 
zinc fingers can be utilized to identify the optimal amino acids at each position. 

10 Alternatively, structural studies and mutagenesis experiments can be used to 

determine the preferred properties of amino acids at each position. Any nucleic acid 
binding domain can be used as a structural scaffold for introducing mutations. In 
particular, positions in close proximity to the nucleic acid binding interface or 
adjacent to a position so located can be targeted for mutagenesis. A mutated test zinc 

15 finger domain can be constrained at any mutated position to a subset of possible 

amino acids by using a patterned degenerate library. Degenerate codon sets can be 
used to encode the profile at each position. For example, codon sets are available that 
encode only hydrophobic residues, aliphatic residues, or hydrophilic residues. The 
library can be selected for full-length clones that encode folded polypeptides. Cho et 

20 al ((2000) J. Mol Biol 297(2):309-19) provides a method for producing such 

degenerate libraries using degenerate oligonucleotides, and also provides a method of 
selecting library nucleic acids that encode full-length polypeptides. Such nucleic 
acids can be easily inserted into an expression plasmid using convenient restriction 
enzyme cleavage sites or transposase or recombinase recognition sites for the 

25 selection methods described herein. 

Selection of the appropriate codons and the relative proportions of each 
nucleotide at a given position can be determined by simple examination of a table 
representing the genetic code, or by computational algorithms. For example, Cho et 
al, supra* describe a computer program that accepts a desired degenerate protein 

30 sequence and outputs a preferred oligonucleotide design that encodes the sequence. 

Isolation of a natural repertoire of domains. A library of domains can be 

constructed from genomic DNA or cDNA of eukaryotic organisms such as humans. 

Multiple methods are available for doing this. For example, a computer search of 
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available amino acid sequences can be used to identify the domains, as described 
above. A nucleic acid encoding each domain can be isolated and inserted into a 
vector appropriate for the expression in cells, e.g., a vector containing a promoter, an 
activation domain, and a selectable maricer. In another example, degenerate - 
oligonucleotides that hybridize to a conserved motif are used to amplify, e.g., by PCR, 
a large number of related domains containing the motif. For example, Kruppel-like 
Cys,His 2 zinc fingers can be amplified by the method of Agata et ai t (1998) Gene 
213:55-64. This method also maintains the naturally occurring zinc finger domain 
linker peptide sequences, e.g., sequences with the pattern: Thr-Gly-(Glu/Gln)- 
(Lys/Aig)-Pro-(Tyr/Phe) (SEQ ID NO:78). Moreover, screening a collection limited 
to domains of interest, unlike screening a library of unselected genomic or cDNA 
sequences, significantly decreases library complexity and reduces the likelihood of 
missing a desirable sequence due to the inherent difficulty of completely screening 
large libraries-; 

The human genome contains numerous zinc finger domains, many of which 
are imcharacterized and unidentified. It is estimated that there are thousands of genes 
encoding proteins with zinc finger domains (Pellegrino and Berg, (1991) Proc. Natl. 
Acad.ScL USA 88:671-675). These human zinc finger domains represent an 
extensive collection of diverse domains from which novel DNA-binding proteins can 
be constructed. If each zinc finger domain recognizes a unique 3- to 4-bp sequence, 
the total number "of domains required to bind every possible 3- to 4-bp sequence is 
only 64 to 25 6 (4 3 to 4"). It is possible that the natural repertoire of the human genome 
contains a sufficient number of unique zinc finger domains to'span all possible 
recognition sites. These zinc finger domains are a valuable resource for constructing 
artificial chimeric DNA-binding proteins. Naturally occurring zinc finger domains, 
unlike artificial mutants derived from the human genome, have evolved under natural 
selective pressures and therefore may be naturally optimized for binding specific 
DNA sequences and in vivo function. 

Human zinc finger domains are much less likely to induce an immune 
response when introduced into humans, e.g., in gene therapy applications. 
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In vivo Selection of Zinc Finger Domains Possessing Specific DNA Binding 
Properties 

Zinc finger domains with desired DNA recognition properties can be 
identified using the following in vivo screening system. A composite binding site of 
5 interest is inserted upstream of a reporter gene such that recruitment of a 

transcriptional activation domain to the composite binding site results in increased 
reporter gene transcription above a given level. An expression plasmid that encodes a 
hybrid protein consisting of a test zinc finger domain fused to a fixed DNA binding 
domain and a transcriptional activation domain is constructed. 

10 The composite binding site includes at least two elements, a recruitment site 

and a target site. The system is engineered such that the fixed DNA binding domain 
recognizes the recruitment site. However, the binding affinity of the fixed DNA 
binding domain for the recruitment site is such that in vivo it alone is insufficient for 
transcriptional activation of the reporter gene. This can be verified by a control - 

15 experiment. 

For example, when expressed in cells, the fixed DNA binding domain (in the 
absence of a test zinc finger domain, or in the presence of a test zinc finger domain 
that is known to be nonfunctional or whose known DNA contacting residues have 
been replaced with an alternative amino acid such as alanine) should not be able to 

20 activate transcription of the reporter gene above a nominal level. Some leaky or low- 
. level activation is tolerable, as the system can be sensitized by other means (e.g., by 
use of a competitive inhibitor for the reporter). The fixed DNA binding domain is 
expected not to bind stably to the recruitment site. For example, the fixed DNA 
binding domain can bind to the recruitment site with a dissociation constant (K^ of 

25 approximately 0.1 nM, 1 nM, 1 \iM, 10 |iM, 100 jjM, or greater. The of the DNA 
binding domain for the target site can be measured in vitro by an electrophoretic 
mobility shift assay (EMS A) in the absence of a test zinc finger domain or in the 
absence of a test zinc finger domain with specificity for the second target site. 

Thus, attachment of a functional test zinc finger domain that recognizes the 

30 target site, e.g., the variable site of the composite binding site, is necessary for the 
hybrid protein to bind stably to the composite binding site in cells, and thereby to 
activate the reporter gene. The binding preference of the test zinc finger domain for 
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the target site results in an increase in reporter gene expression relative to the given 
level. For example, the fold increase of reporter gene expression obtained by dividing 
the observed level by the given level can be approximately v 2, 4, 8, 20, 50, 100, 1000 
fold or greater. When the test zinc finger domain recognizes the target site, the of 
the transcription factor comprising the DNA binding domain Ind the test zinc finger 
domain is decreased, e.g., relative to a transcription factor lacking a test zinc finger 
domain with specificity for the target site. For example, the dissociation constant (KJ 
of a transcription factor complexed to a target site for which it has specificity can be 
approximately 50 nM, 10 nM, 1 nM, 0.1 nM, 0.01 nM or less. The Kj can be 
determined in vitro by EMSA. 

The discovery that DNA binding specificity can be sensitively and accurately 
assayed by determining the ability of test zinc finger domains to augment the in vivo 
binding affinity of a fixed DNA binding domain has enabled the rapid isolation and 
characterization of novel zinc finger domains from the human genome. 

Fixed DNA binding domains include modular domains isolated from naturally 
occurring DNA-binding proteins, e.g., a naturally occurring DNA-binding protein that 
has multiple domains or that is an oligomer. For example, both of two known zinc 
fingers, e.g., fingers 1 and 2 of Zif268, can be used as the fixed DNA binding domain. 
A skilled artisan would be able to identify from the myriad of nucleic acid binding 
domains (e.g., a domain family described herein, such as a homeodomain, a helix- 
turn-helix domain, or a helix-loop-helix domain, or a nucleic acid binding domain 
well characterized in the art) a fixed DNA binding domain suitable for the system. 
Appropriate selection of a recruitment site that is recognized fry the fixed DNA 
binding domain is also necessary. The recruitment site can be a subsite within the 
natural binding site for the naturally occurring DNA binding protein from which the 
fixed DNA binding domain is obtained. If necessary, mutations can be introduced 
either into the fixed domain or into the recruitment site, in order to sensitize the 
system. 

Cells suitable for the in vivo screening system include both eukaryotic and 
prokaryotic cells. Exemplary eukaryotic cells include yeast cells, e.g., Saccharomyces 
cerevisiae, Saccharomyces pombe, and Pichia pastoris cells. 

The yeast one-hybrid system, using Saccharomyces cerevisiae, was modified 

to select zinc finger domains using the aforementioned screening system. First, 
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reporter plasmids that encode the HIS3 reporter gene were prepared. The 
predetermined 4-bp target DNA sequences were connected. to a truncated binding 
sequence to provide composite binding sequences for the DNA-binding domains, and 
each of the composite binding sequences was operably linked to the reporter gene on 
5 separate plasmids. ' 

The hybrid nucleic acid sequence encodes a transcriptional activation domain 
linked to a DNA-binding domain comprising a truncated DNA-binding domain and a 
zinc finger domain. 

The binding sites used herein are not necessarily contiguous, although 
10 contiguous sites are frequently used. Flexible and/or extensible linkers between 

nucleic acid binding domains can be used to construct proteins that recognize non- 
contiguous sites. 

According to one aspect of the present invention, a polypeptide composed of 
finger 1 and finger 2 of Zi£268 and devoid of finger 3 can be used as a fixed DNA- 
15 binding domain. (Among the three zinc finger domains of Zi£268, finger 1 refers to the 
zinc finger domain located at the N-terminal end, finger 2, the zinc finger domain in the 
middle, and finger 3 the zinc finger domain at the C-teiminal end.) Alternately, any two 
zinc finger domains whose binding site is characterized can be used as a fixed DNA- 
binding domain. 

20 Other useful fixed DNA-binding domains may be derived from other zinc 

finger proteins, such as Spl, CF2-II, YY1, Kruppel, WTl, Egr2, or POU-domain 
proteins, such as Octl, Oct2, and Pitl. These are provided by way of example and the 
present invention is not limited thereto. 

According to one particular example of the present invention, the base 

25 sequence of S'-GGGCG^', generated by deleting 4-bp from the 5' end of the optimal 
Zif268 recognition sequence (5'-GCG TGG GCG-3), can be used as a recruitment site. 
Any target sequence of 3 to 4 bp can be linked to this recruitment site, to yield a 
composite binding sequence. 

Activation domains. Transcriptional activation domains that may be used in 

30 the present invention include but are not limited to the Gal4 activation domain from 
yeast and the VP 16 domain from herpes simplex virus. In bacteria, activation domain 
function can be emulated by fusing a domain that can recruit a wild-type RNA 
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polymerase alpha subunit C-terminal domain or a mutant alpha subunit C-terminal 
domain, e.g., a C-terminal domain fused to a protein interaction domain. 

Repression domains. If desired, a repression domain instead of an activation 
domain can be fused to the DNA binding domain. Examples of eukaryotic repression 
domains include ORANGE, groucho, and WRPW (Dawson e/ a/., (1995) M?/. Cell ■ 
Biol. 15:6923-31). When a repression domain is used, a toxic reporter gene and/or a 
non-selectable marker can be used to screen for decreased expression. 

Reporter genes. The reporter gene can be a selectable marker, e.g., a gene 
that confers drug resistance or an auxotrophic marker. Examples of drug resistance 
genes include S. cerevisiae cyclohexamide resistance (C1H), S. cerevisiae canavanine 
resistance gene (CAN1), and the hygromycin resistance gene. S. cerevisiae 
auxotrophic markers include the URA3, HIS3, LEU2, ADE2 and TRP1 genes. When 
an auxotrophic marker is the reporter gene, cells that lack a functional copy of the 
auxotrophic gene and so the ability to produce a particular metabolite are utilized. 
Selection for constructs encoding test zinc finger domains that bind a target site is 
achieved by maintaining the cells in medium lacking the metabolite. For example, the 
HIS3 gene can be used as a selectable marker in combination with a his3' yeast strain. 
After introduction of constructs encoding the hybrid transcription factors, the cells are 
grown in the absence of histidine. Selectable markers for use in mammalian cells, 
such as thymidine kinase, neomycin resistance, and HPRT, are also well known to the 
skilled artisan. 

Alternatively, the reporter gene encodes a protein whose presence can be 
easily detected and/or quantified. Exemplary reporter genes include lacZ, 
chloramphenicol acetyl transferase (CAT), luciferase, green fluorescent protein (GFP), 
beta-glucuronidase (GUS), blue fluorescent protein (BFP), and derivatives of GFP, 
e.g., with altered or enhanced fluorescent properties (Clontech Laboratories, Inc. CA). 
Colonies of cells expressing lacZ can be easily detected by growing the colonies on 
plates containing the colorimetric substrate X-gal. GFP expression can be detected by 
monitoring fluorescence emission upon excitation. Individual GFP expressing cells 
can be identified and isolated using fluorescence activated cell sorting (FACS). 

The system can be constructed with two reporter genes, e.g., a selectable 

reporter gene and a non-selectable reporter gene. The selectable marker facilitates 

rapid identification of the domain of interest, as under the appropriate growth 
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conditions, only cells bearing the domain of interest grow. The non-selectable 
reporter provides a means of verification, e.g., to distinguish false-positives, and a 
means of quantifying the extent of binding. The two reporters can be integrated at 
separate locations in the genome, integrated in tandem in the genome, contained on 
5 the same extrachromosomal element (e.g., plasmid) or contained on separate 
extrachromosomal elements. . 

Fig. 5 illustrates the principle of the modified one-hybrid system used to select 
desired zinc finger domains. The DNA-binding domain of the hybrid transcription 
factor is composed of (a) a truncated DNA-binding domain consisting of finger 1 and 
10 finger 2 of Zif268 and (b) zinc finger domain A or B. The base sequence of the 

binding site located at the promoter region of the reporter gene is a composite binding 
sequence (5 '-XXXXGGGCG-3 which consists of a 4-bp target sequence 
(nucleotides 1 to 4, 5'-XXXX-3'), and a truncated binding sequence (nucleotides 5 to 
9,5'-GGGCG-3'). 

15 If the test zinc finger domain (A in Fig. 5) in the hybrid transcription factor 

recognizes the target sequence, the hybrid transcription factor can bind the composite 
binding sequence stably. This stable binding leads to expression of the reporter gene 
through the action of the activation domain (AD in Fig. 5) of the hybrid transcription 
factor. As a result, when HIS3 is used as a reporter gene, the transformed yeast grows 

20 in medium devoid of histidine. Alternatively, when lacZ is used as a reporter gene, 
the transformed yeast grows as a blue colony in a medium containing X-gal, a 
substrate of the lacZ protein. However, if the zinc finger domain (B in Fig. 5) of the 
hybrid transcription factor fails to recognize the target sequence; expression of the 
reporter gene is not induced. As a result, the transformed yeast cannot grow in the 

25 medium devoid of histidine (when HIS3 is used as a reporter gene) or grows as a 
white colony in a medium containing X-gal (when lacZ is used as a reporter gene). 

The selection method using this modified one-hybrid system is advantageous 
because zinc finger domains selected by virtue of this procedure are demonstrated to 
function in the cellular milieu. Thus, the domains are presumably able to fold, enter 

30 the nucleus, and withstand intracellular proteases and other potentially damaging 

intracellular agents. Furthermore, the modified one-hybrid system disclosed herein 
allows the isolation of desired zinc finger domains quickly and easily. The modified 
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one-hybrid system requires only a single round of transformation of yeast cells to 
isolate the desired zinc finger domains. 

The selection method described herein can be utilized to identify a zinc finger 
domain from a genome e.g., a genome of a plant or animal species'(e.g., a mammal, 
5 e.g., a human). The method can also be utilized to identify a Zinc finger domain from 
a library of mutant zinc finger domains prepared, for example, by random 
mutagenesis. In addition, the two methods can be used in conjunction. For example, 
if a zinc finger domain cannot be isolated from the human genome for a particular 3- 
bp or 4-bp DNA sequence, a library of zinc finger domains prepared by random or 

10 directed mutagenesis can be screened for such a domain. 

Although the modified one-hybrid system in yeast is a preferred means to 
select zinc finger domains that recognize and bind the given target sequences, it will 
be apparent to a person skilled in the art that systems other than yeast one-hybrid 
selection can be used. For example, phage display selection may be used to screen a 

15 library of naturally occurring zinc finger domains derived from a genome of a 
eukaryotic organism. 

The present invention encompasses the use of the one-hybrid method in a 
variety of cultured cells. For example, a reporter gene operably linked to target 
sequences may be introduced into prokaryotic or animal or plant cells in culture, and 

20 the cultured cells may then be transfected with plasmids, phages, or viruses encoding 
a library of zinc finger domains. Desired zinc finger domains recognizing target 
sequences may then be obtained from the isolated cells in which the reporter gene is 
activated. 

The examples disclosed below demonstrate that the method can identify zinc 

25 finger domains for binding sites of interest. A library of hybrid transcription factors 

with a variety of zinc finger domains positioned at finger 3 was prepared. Of the 

novel zinc finger domains (e.g., HSNK, QSTV, and VSTR zinc fingers; see below) 

selected from the library, none is naturally located at the C-terminus in its 

corresponding parent zinc finger protein. This clearly demonstrates that zinc finger 

30 domains are modular and that novel DNA-binding domains can be constructed by 

mixing and matching appropriate zinc finger domains. 

The zinc finger domains selected via the method of the present invention can 

be used as building blocks to make new DNA-binding proteins by appropriate 
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rearrangement and recombination. For example,, a novel DNA-binding protein 
recognizing the promoter region of human CCR5, a coreceptor of HIV- 1, can be 
constructed as follows. The promoter region of human CCR5 contains the following 10- 
bp sequence: 5'-AGG GTG GAG T-3' (SEQ ID NO:4) (Fig. 6). Using the modified 
5 one-hybrid system disclosed herein, one should be able to isolate three zinc finger 
domains, each of which specifically recognizes one of the following 4-bp target 
sequences; 5'-AGGG-3', 5'-GTGG-3', and 5'-GAGT-3\ These target sequences are 
overlapping 4-bp segments of the CCR5 target sequence. These three zinc finger 
domains can be connected with appropriate linkers and attached to a regulatory domain 
10 such as the VP 16 domain and the GAL4 domain or repression domains such as the 
KRAB domain in order to generate novel transcription factors that specifically bind to 
the CCR5 promoter. These zinc finger proteins could be used in gene therapy to help 
prevent proliferation of HIV-1. 

High Throughput Screening 

15 The following method allows rapid measurement of the relative in vivo 

binding affinity for each domain in a collection for multiple possible DNA-binding 
sites or even all possible DNA-binding sites. A large collection of nucleic acids 
encoding nucleic acid binding domains is generated. Each nucleic acid binding 
domain is encoded as the test zinc finger domain in a hybrid nucleic acid construct, 

20 and expressed in -a yeast strain of one mating type. Thus; a first set of yeast strains 

expressing all available or desired domains is generated. A second set of yeast strains 
containing reporter constructs for putative target sites for the domains in the reporter 
construct is constructed in the opposite mating type. The method requires performing 
many or all of the possible pairwise matings in order to create a matrix of fused cells, 

25 each having a different test zinc finger domain and a different target site reporter 
construct. Each fused cell is assayed for reporter gene expression. The method 
thereby rapidly and effortlessly determines the binding preferences of the tested 
domains. 

A collection of domains is identified, e.g., by searching a genomics database 
30 for putative domains that fit a given profile. The collection can include, for example, 
ten to twenty domains, or all the identified domains, possibly thousands or more. 
Nucleic acids encoding the domains identified from the database are amplified using 
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synthetic oligonucleotides. Manual and automated methods for designing such 
synthetic oligonucleotides are routine in the art. Nucleic acids encoding additional 
domains can be amplified with degenerate primers. Nucleic acids encoding the 
domains of the collection are cloned into the yeast expression plasrhid described 
5 above, thus creating fusion proteins of the domains and the firSt two fingers of Zif268 
and a transcription activation domain. The amplification and cloning steps can be 
done in a microtitre plate format in order to clone nucleic acids encoding the multiple 
domains. 

Alternatively, a recombinational cloning method can be used to rapidly insert 

10 multiple amplified nucleic acids encoding the domains into the yeast expression 
vector. This method, which is described in U.S. Patent No. 5,888,732 and the 
"Gateway" manual (Life Technologies-Invitrogen, CA, USA), entails including 
customized sites for a site-specific recombinase at the ends of the amplification 
primers. The expression vector contains an additional site or sites at the position for 

15 insertion of amplified nucleic acid encoding the domain. These sites are designed to 
lack stop codons. Addition of the amplification product, the expression vector, and 
the site-specific recombinase to the recombination reaction results in insertion of the 
amplified sequence into the vector. Additional features, e.g., the displacement of a 
toxic gene upon successful insertion, make this method highly efficient and suitable 

20 for high throughput cloning. 

Restriction enzyme-mediated and/or recombination cloning can be used to 
insert nucleic acids encoding each of the identified domains into an expression vector. 
The vectors can be propagated in bacteria, and frozen in indexed -microtitre plates, 
such that each well contains a cell harboring a nucleic acid encoding one of the 

25 different, unique DNA-binding domains. 

Isolated plasmid DNA is obtained for each domain and transformed into a 
yeast cell, e.g., a Saccharomyces cerevisiae MATa cell. As the expression vector 
contains a selectable marker, the transformed cells are grown in minimal medium 
under nutritional conditions selecting for the marker. Such cells can also be frozen 

30 and stored, e.g., in microtitre plates, for later use. 

A second set of yeast strains is constructed, e.g., in a Saccharomyces 

cerevisiae MATa cell. This set of yeast strains contains a variety of different reporter 

vectors. Each yeast strain bearing an expression vector with a unique DNA-binding 
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domain is then mated to each yeast strain of the reporter gene set. As these two 
strains are from opposite mating types and are engineered to have different 
auxotrophies, diploids can easily be selected. Such diploids have both the reporter 
and the expression plasmids. The cells are also maintained under nutritional 
5 conditions that select for both the reporter and the expression plasmids. Uetz et al 
(2000) Nature 403:623-7 describe a complete two-hybrid map of all yeast proteins by 
generating such a matrix of yeast matings. 

• Reporter gene expression can be detected in a high-volume format, e.g., in 
microtitre plates. For example, when using GFP as the reporter, a plate containing the 
10 matrix of mated cells can be scanned for fluorescence. 



Modular Assembly of Novel DNA-Bindin£ Proteins 

A new DNA-binding protein can be rationally constructed to recognize a 
target 9-bp or longer DNA sequence by mixing and matching appropriate zinc finger" 

15 domains. The modular structure of zinc finger domains facilitates their rearrangement to 
construct new DNA-binding proteins. As shown in Fig. 1, zinc finger domains in the 
naturally-occurring Zif268 protein are positioned tandemly along the DNA double helix. 
Each domain independently recognizes a different 3-4 bp DNA segment. 

A database of zinc finger domains. The one-hybrid selection system 

20 described above can be utilized to identify one or more zinc finger domains for each 

possible 3 or 4 basepair binding site. The results can be stored as a matrix or database, 
e.g., a relational database. The database can include an indication of the relative 
affinity of the zinc finger domains that bind each site. 

Such zinc finger domains can also be tested in the context of multiple 

25 different fusion proteins to verify their specificity. Moreover, particular binding sites 
for which a paucity of domains is available can be the target of additional selection 
screens. Libraries for such selections can be prepared by mutagenizing a zinc finger 
domain that binds a similar yet distinct site. A complete matrix of zinc finger 
domains for each possible binding site is not essential, as the domains can be 

30 staggered relative to the target binding site in order to best utilize the domains 

available. Such staggering can be accomplished both by parsing the binding site in 
the most useful 3 or 4 basepair binding sites, and also by varying the linker length 
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between zinc finger domains. In order to incorporate both selectivity and high affinity 
into the design polypeptide, zinc finger domains that have high specificity for a 
desired site can be flanked by other domains that bind with higher affinity, but lesser 
specificity. The in vivo screening method described herein can used to test the-i'n vivo 
5 function, affinity, and specificity of an artificially assembled zinc finger protein and 
derivatives thereof. Likewise, the method can be used to optimize such assembled 
proteins, e.g., by creating libraries of varied linker composition, zinc finger domain 
modules, zinc finger domain compositions, and so forth. 

Parsing a target site. The target 9-bp or longer DNA sequence is parsed into 3 

10 or 4 bp segments. Zinc finger domains are identified (e.g.; from a database described 
above) that recognize each parsed 3 or 4 bp segment. Longer target sequences, e.g., 20 
bp to 500 bp sequences, are also suitable targets as 9 bp, 12 bp, and 15 bp subsequences 
can be identified within them. In particular, subsequences amenable for parsing into 
sites well represented in the database can serves as initial design targets. 

15 Constructing Assembled Modules. Polypeptide sequences are designed to 

contain multiple zinc finger domains that recognize adjacent 3 or 4 bp subsites, or 
nearby subsites. A nucleic acid sequence encoding the designed polypeptide 
sequence can be synthesized. Methods for constructing synthetic genes are routine in 
the art. Such methods include gene construction from custom synthesized 

20 oligonucleotides, PCR mediated cloning, and mega-primer PGR. Multiple nucleic 
acid sequences can be synthesized, e.g., to form a library. For example, the library 
nucleic acids can be designed such that the sequences encoding a domain at any given 
position vary such that they encode different zinc finger domains -whose recognition 
specificity is suitable for that position. Sexual PCR and "DNA Shuffling™" 

25 (Maxygen, Inc., CA) can be used to vary the identity of zinc finger domains at each 
position. 

Peptide Linkers. DNA binding domains can be connected by a variety of 
linkers. The utility and design of linkers are well known in the art. A particularly 
useful linker is a peptide linker that is encoded by nucleic acid. Thus, one can 
30 construct a synthetic gene that encodes a first DNA binding domain, the peptide linker, 
and a second DNA binding domain. This design can be repeated in order to construct 
large, synthetic, multi-domain DNA binding proteins. PCT WO 99/45132 and Kim 
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and Pabo ((1998) Proc. Natl Acad. Set USA 95:2812-7) describe the design of 
peptide linkers suitable for joining zinc finger domains. 

Additional peptide linkers are available that fonn random coil, a-helical or 0- 
pleated tertiary stnictures. Polypeptides that form suitable flexible linkers are" well 
5 known in the art (see, e.g., Robinson and Sauer (1998) Proc Natl Acad Sci USA, 
95:5929-34). Flexible linkers typically include glycine, because this amino acid, 
which lacks a side chain, is unique in its rotational freedom. Serine or threonine can 
be interspersed in the linker to increase hydrophilicity. In additional, amino acids 
capable of interacting with the phosphate backbone of DNA can be utilized in order to 

10 increase binding affinity. Judicious use of such amino acids allows for balancing 

increases in affinity with loss of sequence specificity. If a rigid extension is desirable 
as a linker, a-helical linkers, such as the helical linker described in Pantoliano et al 
(1991) Biochem. 30:10117-10125, can be used. Linkers can also be designed by 
computer modeling (see, e.g., U.S. Pat. No. 4,946,778). Software for molecular. 

15 modeling is commercially available (e.g., from Molecular Simulations, Inc., San 

Diego, CA). The linker is optionally optimized, e.g., to reduce antigenicity and/or to 
increase stability, using standard mutagenesis techniques and appropriate biophysical 
tests as practiced in the art of protein engineering, and functional assays as described 
herein. 

20 For implementations utilizing zinc finger domains, the peptide that occurs 

naturally between zinc fingers can be used as a linker to join fingers together. A 
typical such naturally occurring linker is: Thr-Gly-(Glu or Gln)-(Lys or Axg)-Pro-(Tyr 
or Phe) (SEQ ED NO:78) (Agata et al , supra). 

Dimerization Domains. An alternative method of linking DNA binding 

25 domains is the use of dimerization domains, especially heterodimerization domains 
(see, e.g., Pomerantz et al (1998) Biochemistry 37:965-970). In this implementation, 
DNA binding domains are present in separate polypeptide chains. For example, a 
first polypeptide encodes DNA binding domain A, linker, and domain B, while a 
second polypeptide encodes domain C, linker, and domain D. An artisan can select a 

30 dimerization domain from the many well-characterized dimerization domains. 

Domains that favor heterodimerization can be used if homodimers are not desired. A 
particularly adaptable dimerization domain is the coiled-coil motif, e.g., a dimeric 
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parallel or anti-parallel coiled-coil. Coiled-coil sequences that preferentially form 
heterodimers are also available (Lumb and Kim, (1995) Biochemistry 34:8642-8648). 
Another species of dimerization domain is one in which dimerization is triggered by a 
small molecule or by a signaling event. For example, a dimeric form of FK506 can be 
5 used to dimerize two FK506 binding protein (FKBP) domains: Such dimerization 
domains can be utilized to provide additional levels of regulation. 

Functional Assays and Uses 

In addition to biochemical assays, the function of a nucleic acid binding domain 
or a protein designed by a method described herein, e.g., by modular assembly, can be 
10 assayed or used in vivo. For example, domains can be selected to bind to a target site, 
e.g., to a promoter site of a gene required for cell proliferation. By modular assembly, a 
protein can be designed that includes (1) the selected domains that respectively bind to 
subsites spanning the target promoter site, and (2) a DNA repression domain, e.g., a 
WRPW domain. 

1 5 A nucleic acid sequence encoding a designed protein can be cloned into an 

expression vector, e.g., an inducible expression vector as described in Kang and Kim, 
(2000) J Biol Chem 275:8742. The inducible expression vector can include an inducible 
promoter or regulatory sequence. Non-limiting examples of inducible promoters include 
steroid-hormone responsive promoters (e.g., ecdysone-responsive, estrogen-responsive, 

20 and glutacorticoid-responsive promoters), the tetracyclin "Tet-On" and "Tet-OfF' 

systems, and metal-responsive promoters. The construct can be transfected into tissue 
culture cells or into embryonic stem cells to generate a transgenic organism as a model 
subject. The efficacy of the designed protein can be determined by inducing expression 
of the protein and assaying cell proliferation of the tissue culture cell or assaying for 

25 developmental changes and/or tumor growth in a transgenic animal model. In addition, 
the level of expression of the gene being targeted can be assayed by routine methods to 
detect mRNA, e.g., RT-PCR or Northern blots. A more complete diagnostic includes 
purifying mRNA from cells expressing and not expressing the designed protein. The 
two pools of mRNA are used to probe a microarray containing probes to a large 

30 collection of genes, e.g., a collection of genes relevant to the condition of interest (e.g., 
cancer) or a collection of genes identified in the organism's genome. Such an assay is 
particularly valuable for determining the specificity of the designed protein. If the 
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protein binds with high affinity but little specificity, it may cause pleiotropic and 
undesirable effects by affecting expression of genes in addition to the contemplated 
target. Such effects are revealed by a global analysis of transcripts. 

In addition, the designed protein can be produced in a subject cell or subject 
5 organism in order to regulate an endogenous gene. The designed protein is configured, 
as described above, to bind to a region of the endogenous gene and to provide a 
transcriptional activation or repression function. As described in Kang and Kim (supra), 
the expression of a nucleic acid encoding the designed protein can be operably linked to 
an inducible promoter. By modulating the concentration of the inducer for the promoter, 
10 the expression of the endogenous gene can be regulated in a concentration dependent 
manner. 

Assaying binding site preference 

The binding site preference of each domain can be verified by a biochemical " 
15 assay such as EMS A, DNase footprinting, surface plasmon resonance, or column 

binding. The substrate for binding can be a synthetic oligonucleotide encompassing 
the target site. The assay can also include non-specific DNA as a competitor, or 
specific DNA sequences as a competitor. Specific competitor DNAs can include the 
recognition site with one, two, or three nucleotide mutations. Thus, a biochemical 
20 assay can be used to measure not only the affinity of a domain for a given site, but 
also its affinity to the site relative to other sites. Rebar and Pabo, (1994) Science 
263:671-673 describe a method of obtaining apparent K^, constants for zinc finger 
domains from EMS A. 

25 The present invention will be described in more detail through the following 

practical examples. However, it should be noted that these examples are not intended 
to limit the scope of the present invention. 

Example 1: Construction of plasmids for hybrid transcription factor expression. 

30 An expression plasmid expressing a zinc finger transcription factor was 

prepared by modification of pPC86 (Chevray and Nathans, (1991) Proc. Natl Acad. 
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ScL USA 89:5789-5793). Manipulations of DNA were performed as described in 
Ausubel et al (Current Protocols in Molecular Biology (1998), John Wiley and Sons, 
Inc.). A DNA fragment encoding Zif268 zinc finger protein was inserted between the 
Sail and £coRI recognition sites of pPC86 to generate pPCFM-Zif. The resuHrof this 
5 cloning step is a translational fusion protein encoding the yeast Gal4 activation 

domain followed by the three Zif268 zinc fingers. Transformation of pPCFM-Zif into 
yeast cells results in expression of a hybrid transcription factor comprising the yeast 
Gal4 activation domain and the Zif268 zinc fingers. The DNA sequence encoding the 
Zif268 zinc finger protein as cloned in pPCFM-Zif is shown in Fig. 9. 

10 The plasmid pPCFMS-Zif was utilized as a vector for constructing libraries of 

zinc finger domains (Fig. 8). pPCFMS-Zif was constructed by insertion of an 
oligonucleotide cassette containing a stop codon and a Pstl recognition site in front of 
the finger 3 coding region of pPCFM-Zif. The oligonucleotide cassette was formed 
by annealing two synthetic oligonucleotides: 5*- 

15 TGCCTGC AGCATTTGTGGGAGGAAGTTTG-3 ' (SEQ ID NO:79); and 5'- 

ATGCTGCAGGCTTAAGGCTTCTCGCCGGTG-3 '(SEQ ED NO:80). The insertion 
of a stop codon prevents the generation of library plasmids encoding finger 3 of 
Zif268. 

The plasmid was used as a vector for the generation of zinc finger domain 

20 libraries as described in "Example 2" below. 

In addition, gap repair cloning of DNA sequences encoding individual zinc 

finger domains was carried out as described in Hudson et aL, ((1997) Genome 

Research 7:11 69-1 1 73) with minor modification. 

To clone an individual zinc finger domain, two overlapping oligonucleotides 

25 were synthesized. Each oligonucleotide included a 2 1-nucleo tide-long common tail at 

its 5' end for second round PCR (rePCR) and a specific sequence that annealed to the 

nucleic acid encoding the individual zinc finger domain. The sequences of the 

forward and back primers were 5'-ACCCACACTGGCCAGAAACCCN 48 . 5l - 3' 

(SEQ ID NO: 108) and S'-GATCTGAATTCATTCACCGGTN^ - 3' (SEQ ID 

30 NO: 109), respectively, where N^j, and N 42 . 45 correspond to the customized sequence 

for annealing to the nucleic acid encoding the zinc finger domain. Double stranded 

DNA was prepared by amplifying template nucleic acid with an equimolar misture of 

two oligonucleotides. PCR conditions consisted of a first cycle at 94°C for 3 minutes 
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followed by 5 cycles of 94°C for 1 minutes , 50°C for 1 minutes , and 72°C for 30 
seconds. 

The double stranded DNA encoding each zinc finger domain was then used as 
a template in second round PGR. The rePCR primers had two regions, one region that 
5 is identical to yeast vector pPCFM-Zif and a second region that is identical to the 21- 
nucleotide-long common tail sequence described above. The sequence of forward 
primer was 

5 r -TGTCGAATCTGCATGCGTAACTTCAGTCGTAGTGACCACCTTACCACCCA 
C ATCCGGACCCACACTGGCCAGAAACCC-S 1 (SEQ ID NO: 138) and that of 

10 reverse primer was 

5'-GGTGGCGGCCGTTACTTACTTAGAGCTCGACGTCTTACTTACTTAGC 
GGCCGCACTAGTAGATCTGAATTCATTCACCGGT - 3 r (SEQ ID NO:139). The 
reaction mixture contained 2.5 pmoles of each primer, 1 .5mM Mg 2 *, 2 units of Taq 
polymerase and 0.01 units of Pfu polymerase in 25 uL Reactions were carried out at 

15 94°C for 3 min, then cycled through 20 cycles of 94°C for 1 min, 65 °C for lmin, and 
72°C for 30 sec. Gap repair cloning was performed by transforming the mixture of 
rePCR products and linearized pPCFM-Zif vector that had been digested with Mscl 
and EcaRI into yeast YW1 cells. The region identical to the yeast vector pPCFM-Zif 
allows for homologous recombination with the vector in cells. 

20 

Example 2: Construction of Zinc Finger Domain Library 

A plasmid library of naturally occurring zinc finger domains was prepared by 
cloning zinc finger domains from the human genome. DNA segments encoding zinc 
finger domains were amplified from template human genomic DNA (purchased from 
25 Promega Corporation, Madison, WI, USA) using PCR and degenerate oligonucleotide 
primers. The DNA sequences of the degenerate PCR primers used to clone human 
zinc finger domains were as follows; 5'- GCGTCCGGACNCAYACNGGNSARA -3' 
(SEQ ID NO:81) and 5'- CGGAATTCANNBRWANGGYYTYTC -3' (SEQ ID 
NO:82), wherein R represent G and A; B represents G, C, and T; S represents G and 
30 C; W represents A and T; Y represents C and T; and N represents A, C, G, and T. 

The degenerate PCR primers anneal to nucleic acid sequences coding for an 
amino acid profile, His-Thr-Gly-(Glu or Gln)-(Lys or Arg)-Pro-(Tyr or Phe) (SEQ ID 
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NO:83), that is found at the junction between zinc finger domains in many naturally 
occurring zinc finger proteins (Agata et al (1998) Gene 213:55-64). 

The buffer composition of the PCR reaction was 50 raM KC1, 3 mM MgCU, 
10 mM Tris pH 8.3. Taq DNA polymerase was added and the reaction mixture was 
incubated at 94°C for 30 seconds, at 42°C for 60 seconds, and'then at 72°C for 30 
seconds. This cycle was repeated 35 times, and was followed by a final incubation at 
72°Cfor 10 minutes. 

The PCR products were cloned into pPCFMS-Zif as follows: The PCR 
products were electrophoresed, and the DNA segments corresponding to about 120 bp 
were isolated- After digestion with BspEI and EcoRI, the 120-bp DNA segments 
were ligated into pPCFMS-Zif. As a result, the DNA-binding domain of the hybrid 
transcription factor encoded by this plasmid library consists of finger 1 and finger 2 of 
Zif268 and a zinc finger domain derived from the human genome.The plasmid library 
was prepared from a total of 10 6 Escherichia coli transformants. This library 
construction scheme retains the naturally occurring linker sequence found between 
zinc finger domains. 

Example 3: Construction of Zinc Finger Domain Library 

A library of mutant zinc finger domains was prepared by random mutagenesis. 
Finger 3 of Zif26.8 was used as a polypeptide framework.- Random mutations were 
introduced at positions -1, 2, 3, 4, 5, and 6 along the a-helix, corresponding 
respectively to the arginine at position 73, aspartic acid at position 75, glutamic acid 
at position 76, arginine at position 77, lysine at position 78, and arginine at position 
79 of SEQ ID NO:21 (within finger 3 of Zi£268). 

At each of the nucleic acid sequence positions encoding these amino acids, a 
randomized codon, 5'-(G/A/C) (G/A/C/T) (G/Q-3, was introduced. This randomized 
codon encodes any one of 16 amino acids (excluding four amino acids: tryptophan, 
tyrosine, cysteine and phenylalanine). Also excluded are all three possible stop 
codons. The randomized codons were introduced with an oligonucleotide cassette 
constructed from two oligonucleotides: 

S'-GGGCCCGGGGAGAAGCCTTACGCATGTCCAGTCGAATCTTGTGAT 
AGAAGATTC-3' (SEQ ID NO:84); and 
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S^CTCCCCGCGGTTCGCCGGTGTGGATTCTGATATGSNBSNBAAGSNB 
SNBSNBSNBTGAGAATCTTCTATCACAAG-3 ' (SEQ ID NO:85), wherein B 
represents G, T, and C; S represents G and C; and N represents A, G, C, and T. 

After annealing these two oligonucleotides, the DNA duplex cassette was 
5 synthesized by reaction with Klenow polymerase for 30 minutes. After digestion with 
Aval and Sacll, the DNA duplex was ligated into pPCFMS-Zif digested with SgrAl 
and SacU. Plasmids were isolated from about 10 9 E. coli transformants. 

Example 4: Construction of reporter plasmids 

Reporter plasmids including the yeast HIS 3 gene were prepared by 
10 modification of pRS315His (Wang and Reed (1993) Nature 364:121-126). The 
reporter plasmids also contain the LEU2 marker under its natural promoter for the 
purpose of selecting transformants bearing the plasmid. First, the Sail recognition site 
in pRS31 5His was removed by ligating the small fragment of pRS3 15His after 
digestion with Sail and BamBI and the large fragment of pRS315His after digestion - 
15 with BamBI and Xhol to make pRS3 1 5HisA Sal Next, a new Sail recognition site 
was created within the promoter region of the HIS3 gene by inserting an 
oligonucleotide duplex into pRS315HisA Sal between the BamBI and Smal site. The 
sequences of the two oligonucleotides that were annealed to produce the inserted 
duplex are 

20 5 '-CTAGACCCGGGAATTCGTCGACG^'" (SEQ ID NO:86); and 

5'-GATCCGTCGACGAATTCCCGGGT-3' (SEQ ID NO:87). The resulting 
plasmid was named pRS3 ISHisMCS. 

Multiple reporter plasmids were constructed by inserting desired composite 
sequences into pRS 3 ISHisMCS. The composite sequences are inserted as a tandem 
25 array containing four copies of the composite sequence. The target sequences were 
derived from 10-bp DNA sequences (Fig. 6) found in the LTR region of HIV-1 : 
5M3AC ATC GAGC-3' (SEQ ID NO:l) HIV-1 LTR (-124/-115) 

5'-GCA GCT GCT T-3' (SEQ ID NO:2) HIV-1 LTR (-23/-14) 

5'-GCT GGG GAC T-3' (SEQ ID NO:3) HIV-1 LTR (-957-86)) 

and in the promoter of human CCR5 gene: 
5'-AGG GTG GAG T-3' (SEQ ID NO:4) human CCR5 (-70/-79) 
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5'-GCT GAG ACA T-3' (SEQ ID NO:5) human CCR5 (+7/+ 16)). 

Each of these 10-bp DNA sequence can be parsed -into component 4-bp target 
sites in order to identify a zinc finger domain that recognizes each region of the site. 
Using the modular assembly method, such zinc finger domains can be coupledto 
produce a DNA binding protein that recognizes the site in vivo. 

The underlined portions in Fig. 6 depict examples of 4-bp target sequences. 
Each of these 4-bp target sequences was connected to the 5-bp recruitment sequence, 
5'-GGGCG-3', that is recognized by finger 1 and finger 2 of Zif268. The resulting 9- 
bp sequences constitute composite binding sequences. Each composite binding 
sequence has the following format: 

5'-XXXXGGGCG-3\ where XXXX is the 4-bp target sequence and the adjacent 5'- 
00000-3* is the recruitment sequence. 

Fig. 7 recites the DNA sequences of the inserted tandem arrays of composite 
binding sites, each of which was operably linked to the reporter gene in 
pRS315HisMCS. Each tandem array contains 4 copies of a composite binding 
sequence. For each binding site, two oligonucleotides were synthesized, annealed and 
ligated into pRS3 1 SHisMCS restricted with &fl and Xmal site to make a reporter 
plasmid. 

Example 5: Construction of reporter plasmids 

A set of reporter plasmids that includes a pair of reporters (one having lacZ, 
the other having HIS3) for each 3 basepair subsite was constructed as follows: 
Reporter plasmids were constructed by inserting the desired target sequences into 
pRS3 1 SHisMCS and pLacZi. For each 3 basepair target site, two oligonucleotides 
were synthesized, annealed, and inserted into the Sail and Xmal site of 
pRS3 1 SHisMCS and of pLacZi to make reporter plasmids. The DNA sequences of 
the oligonucleotides were as follows: 5 - CCGGT NNNTGGGCG TAC 
NNNTGGGCG TCA NNNTGGGCG -3* (SEQ ID NO:88) and 5'- TCGA 
CGCCCANNN TGA CGCCCANNN OTA CGCCCANNN A -3' (SEQ ID NO:89). 
Total 64 pairs of oligonucleotides were synthesized and inserted into the two reporter 
plasmids. 
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Example 6: Selection of zinc finger domains with desired DNA-binding 
specificity 

To select zinc finger domains that specifically bind given target sequences, 
yeast cells were transformed first with a reporter plasmid and then 'a library of -hybrid 
5 plasmids encoding hybrid transcription factors. Yeast transformation and screening 
procedures were carried out as described in Ausubel et al (Current Protocols in 
Molecular Biology (1998), John Wiley and Sons, Inc.). Yeast strain yWAM2 
(MATa (alpha) A gal4 A gal80 URA3::GALl-lacZ lys280J his3-k 200trpl-L 63 
leu2 ade2-101CYH2) was used 

10 In one instance, yeast cells were first transformed with a reporter plasmid 

containing the composite binding sequence 5'-GAGCGGGCG-3' (the 4-bp target 
sequence is underlined), which was operably linked to the reporter gene. Then, the 
plasmid library of mutant zinc finger domains prepared by random mutagenesis was 
introduced into the transformed yeast cells. About 10 6 colonies were obtained in 

15 medium lacking both leucine and tryptophan. Because the reporter plasmid and the 
zinc finger domain expression plasmids contain yeast LEU2 and TRP1 genes, 
respectively, as a marker, yeast cells were grown in medium lacking both leucine and 
tryptophan in order to select for cells that contain both the reporter and the zinc finger 
domain expression plasmid. 

20 In one implementation, the library of zinc finger domains derived from the 

human genome was transformed into cells bearing the reporter plasmids. The 
transformation was performed on five different host cell strains, each strain containing 
one of five different target sequences operably linked to the reporter gene. About 10 5 
colonies were obtained per transformation in medium lacking both leucine and 

25 tryptophan. Transformants were grown on petri plates containing synthetic medium 
lacking leucine and tryptophan. After incubation, transformed cells were collected by 
applying a 10% sterile glycerol solution to the plates, scraping the colonies into the 
solution, and retrieving the solution. Cells were stored as frozen aliquots in the 
glycerol solution. , A single aliquot was spread onto medium lacking leucine, 

30 tryptophan and histidine. 3-aminotriazole (AT) was added to the growth medium at 

the final concentrations of 0, 0.03, 0.1 and 0.3 mM. AT is a competitive inhibitor of 

His3 and titrates the sensitivity of the HIS 3 selection system. AT suppressed the basal 

activity of His3. Such basal activity can arise from leaky expression of the HIS3 gene 
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on the reporter plasmid. Out of about 10 7 yeast cells spread on medium, on the order 
of hundreds of colonies grew in the selective medium lacking AT. The number of 
colonies gradually decreased as the concentration of AT increased. On the order of 
tens of colonies grew in the selective medium containing 0.3 mM of AT. Several 
colonies were randomly picked from the medium lacking AT and from the medium 
containing 0.3 mM of AT. Plasmids were isolated from yeast cells and transformed 
into Escherichia coli strain KC8 {pyrF leuB600 trpC hisB463). The plasmids 
encoding zinc finger transcription factor were isolated, and the DNA sequences of 
selected zinc finger domains were determined. 

The amino acid sequence of each selected zinc finger domain was deduced 
from the DNA sequence. Each zinc finger domain was named after the four amino 
acid residues at base-contacting positions, namely positions -1, 2, 3, and 6 along the 
alpha-helix. The results are shown in Table 1. Identified zinc finger domains are 
named by the four amino acids found at base-contacting positions. Analysis of the 
sequences showed that in some cases the same zinc finger domain was obtained 
repeatedly. The numbers in the parenthesis in Table 1 represent how many times the 
same zinc finger domains have been obtained. For example, two zinc fingers having 
CSNR at the four base contacting positions were identified as binding the GAGC 
nucleic acid site (see column 3, "GAGC/human genome"). 
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Target Sequence 




uAut 


GCTT 


GACT 


GAGT 


ACAT. 


origin of zinc 


random 


human 


human 


human 


Human 


human 


finger domain 


mutagenesis 


genome 


genome 


genome \ 


genome 


genome 


library 














amino acid 


KTNR(2) 


RTNR(2) 


VSTR(9) 


HSNK(2) 


RDER(2) 


QSTV(3) . 


residues at base 


RTTR 


RTNR 




CSNR(7) 


SSNR(5) 


contacting 


RPNR 


CSNR(2) 








positions* 


HSNR 


SSNR<3) 










RLKP 


RSTV 












TRQR 


SSGE 











TALH 
RQKA 
PARV 
RTFR 
RNNR 
DPLH 

RGNR 

* The four-letter identifiers in the six columns to the right are the descriptors of 
the zinc finger domains isolated for each target sequence. Although these names are 
5 indicative of the amino acid residues at base contacting positions, they are not sequences 
of polypeptides. 

The full DNA sequences encoding selected human zinc finger domains and 
their translated amino acid sequences are shown in Fig. 11. The DNA sequence that 

10 is complementary to the degenerate PCR primers used to amplify DNA segments 

encoding zinc finger domains in the human genome is underlined. This sequence may 
differ from the original base sequence of reported human genome sequence due to 
either allelic differences or alterations introducing during" amplification. 

Most human zinc finger domains identified by screening in accordance with 

15 the present invention either were novel polypeptides or corresponded to anonymous 
open reading frames. For example, zinc finger domains designated as HSNK 
(contained in the sequence reported in GenBank accession number AF 155 100) and 
VSTR (contained in the sequence reported in GenBank accession number AF02577) 
are found in proteins whose function is as yet unknown. The results described herein 

20 not only indicate that these zinc finger domains are able to function as sequence- 
specific DNA-binding domains, but also document their preferred binding site 
preference in the context of chimeric proteins. 

In addition, the present invention reveals that zinc finger domains obtained 
from the human genome can be used as modular building blocks to construct novel 

25 DNA-binding proteins. Human zinc finger domains of the present invention were 
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obtained as a result of their functionality in vivo when connected to the C-terminus of 
finger 1 and finger 2 of Zif268. Thus, the identified zinc finger domains can 
recognize specific sequence in an artificial context, and are suitable as modular 
building blocks for designing synthetic transcription factors. 

Example 7: Pairwise Mating 

To facilitate identification of zinc finger domains that bind to each 3 basepair 
target site, yeast mating was used to eliminate the need for repetitively transforming 
yeast cells and to search for positive binders to each of the 64 reporter constructs with 
a single transformation. Two yeast strains, YWI {MATa mating type) and YPH499 
{MATa mating type), were used. YWI was derived from yWAM2 by selecting a 
clone resistant to 5-fluoroorotic acid (FOA) in order to generate a ura3- derivative of 
yWAM2. 

The plasmid library of zinc finger domains were introduced into the YWI 
cells by yeast transformation. Cells from approximately 10 6 independently 
transformed colonies were collected by scraping plates with a 10% glycerol solution. 
The solution was frozen in aliquots. Each pair of 64 reporter plasmids (derived from 
pLacZi or pRS315His) also was cotransfected into yeast strain YPH499. 
Transform ants containing both reporter plasmids were harvested and frozen. 

After thawing, the yeast cells were grown on minimal media to mid-log phase. 
The two cell types were then mixed and allowed to mate in YPD for 5 h. Diploid 
cells were selected on minimal media containing X-gal and AT.(1 mM) but lacking 
tryptophan, leucine, uracil, and histidine. After several days, blue colonies that grew 
on the selective plate were isolated. The plasmids encoding zinc finger domains were 
isolated from blue colonies, and the DNA sequences of the selected zinc finger domains 
were determined. 

The nucleic acids isolated from the blue colonies were individually 
retransformed into YWI cells. For each isolated nucleic acid, retransformed YWI 
cells were mated to YPH499 cells containing each of the 64 LacZ reporter plasmids in 
a 96-well plate, and then spread onto minimal media containing X-gal but lacking 
tryptophan and uracil. The DNA binding affinities and specificities of a zinc finger 
domain for 64 target sequences were determined by the intensity of blue color. 
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Control experiments with the Zif268 zinc finger domains indicated that positive 
interactions between a zinc finger domain and a binding site yielded dark to pale blue 
colonies, (whose blue intensity is proportional to the binding affinity) and that 
negative interactions yielded white colonies, ' - 

5 

Example 8: Comparison of Identified Zinc Finger Domains with an Interaction 
Code 

The amino acid residues of selected zinc finger domains at the critical base- 
contacting positions were compared with those anticipated from the zinc finger 

10 domain-DNA interaction code (Fig. 3). Most of zinc finger domains showed expected 
patterns, i.e. the amino acid residues at the critical positions match well those 
predicted from the code. 

For example, the consensus amino acid residues in zinc finger domains 
selected from the library generated by random mutagenesis were R (Arg; 7 out of 14) 

15 or K (Lys; 2 out of 14) at position -1, N (Asp; 6 out of 14) at position 3, and R (9 out 
of 14) at position 6 (Table 1). These zinc finger domains were selected with the 
GAGC plasmid. (The reporter plasmid in which the composite binding sequence, 5'- 
GAGCGGGCG-3', is operably linked to the reporter gene is referred to as the GAGC 
plasmid. Likewise, the other reporter plasmids in which the sequence, 5'- 

20 XXXXGGGCG-3 5 , is operably linked to the reporter gene are referred to, as the 
XXXX plasmids.) These amino acid residues at critical base-contacting positions 
exactly match those expected from the code. [Most of the zinc finger domains in the 
human genome contain S (serine) at position 2 and a serine residue is capable of 
forming a hydrogen bond with any of the four bases. Thus the effect of this position 

25 will not be considered hereinafter. It is also known that the residues at position 2 

usually play only a minor role in base recognition (Pavletich and Pabo (1991) Science 
252,809-817).] 

The amino acid residues in zinc finger domains obtained from the human 
genome also match those expected from the code quite well. For example, the 
30 consensus amino acid residues at position -1,3, and 6 in the zinc finger domains 

obtained with the GAGC plasmid were R, N, and R, respectively (Table 1, column 3). 
These amino acids are exactly those anticipated from the code. 
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The amino acid residues at position -1, 3, and 6 in the zinc finger domain 
obtained with the GCTT plasmid were V, T, and R, respectively (Table 1, column 4). 
The T and R residues are exactly those expected from the code. The amino acid 
residues predicted from the code at position -1 that would interact with the base T 
(underlined) of the GCTT site are L, T or N. The VSTR zinc ffliger domain, which 
was selected with the GCTT plasmid, contained V (valine), a hydrophobic amino acid 
similar to L (leucine) at this position. 

Overall, the amino acid residues in selected zinc finger domains match those 
predicted from the code at least at two positions out of the three critical positions. 
The amino acid residues in selected zinc finger domains that are expected from the 
code are underlined in Table 1 . These results strongly suggest that the in vivo 
selection system disclosed herein functions as expected. 

Example 9: Retransformation and Cross-transformation 

To rule out the possibility of false positive results and to investigate the 
sequence specificity of the zinc finger protein described above, retransformation and 
cross-transformation of yeast cells were carried out using the isolated plasmids. 

Yeast cells were first co-transformed with a reporter plasmid and a hybrid 
plasmid encoding a zinc finger domain. Yeast transformants were inoculated into 
minimal medium lacking leucine and tryptophan and incubated for 36 hours. About 
1,000 cells in the growth medium were spotted directly onto solid medium lacking 
leucine, tryptophan, and histidine (designated as - histidine in Fig. 10) and onto solid 
medium lacking leucine and tryptophan (designated as + histidine in Fig. 10). These 
cells were then incubated for 50 hours at 30°C. The results are shown in Fig. 10. 

It is expected that colonies can grow in the medium lacking histidine when the 
zinc finger moiety of the hybrid transcription factor binds the composite binding 
sequence, allowing the hybrid transcription factor to activate expression of the HIS3 
reporter gene. Colonies cannot grow in the medium lacking histidine when the zinc 
finger moiety of the transcription factor does not bind the composite binding sequence. 

As shown in Fig. 10, the isolated zinc finger domains were capable of binding 
corresponding target sequences and showed sequence specificity markedly different 
from that of Zif268. Zif268 showed higher activity with the GCGT plasmid than with 
the other five plasmids, and relatively high activity with the GAGT plasmid. No 
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colonies were formed by strains having reporters containing other binding sites and 
expressing the Zif268 protein. 

The KTNR zinc finger domain isolated from the random mutant library was 
originally selected with the GAGC reporter plasmid. As expected, colonies were 

5 formed only with the GAGC plasmid. Zinc finger domains obtained from the library 
derived from the human genome also showed expected specificity. For example, 
HSNK, which had been selected with the GACT plasmid, allowed cell growth only 
with the GACT plasmid when retransformed into yeast cells. VSTR, which had been 
selected with the GCTT plasmid, showed the highest activity with the GCTT plasmid. 

10 RDER, which was selected with the GAGT plasmid, has the same amino acid 

residues at the four base-contacting positions as does finger 3 of Zif268. As expected, 
this zinc finger domain showed sequence specificity similar to that of finger 3. SSNR, 
selected with the GAGC and GAGT plasmids, allowed cell growth on histidine- 
deficient medium with the GAGC plasmid but not with the GAGT plasmid. QSTV, 

15 obtained with the AC AT plasmid, did not allow cell growth with any of the plasmids 
tested in this assay. However, this zinc finger domain was able to bind to the ACAT 
sequence tightly in vitro as demonstrated below. 

Example 10: Gel shift assays 

20 Zinc finger proteins containing zinc finger domains selected using the 

modified one-hybrid system were expressed in E. coli, purified, and used in gel shift 
assays. The DNA segments encoding zinc finger proteins in .the hybrid plasmids were 
isolated by digestion with Sail and Noil and inserted into pGEX-4T2 (Pharmacia 
Biotech) between the Sail and Noil sites. Zinc finger proteins were expressed in E. 

25 coli strain BL21 as fusion proteins connected to GST (Glutathione-S-transferase). The 
fusion proteins were purified using glutathione affinity chromatography (Pharmacia 
Biotech, Piscataway, NJ) and then digested with thrombin, which cleaves the 
connecting site between the GST moiety and zinc finger proteins. Purified zinc finger 
proteins contained finger 1 and finger 2 of Zif268 and selected zinc finger domains at 

30 the C-terminus. 

The following probe DNAs were synthesized, annealed, labeled with 32 P, using 
T4 polynucleotide kinase, and used in gel shift assays. 
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GCGT; 5' -CCGGGTCGCGCGIGGGCGGTACCG-3' (SEQ ID NO: 90) 

3'-CAGCGCGCACCCGCCATGGCAGCT-5' (SEQ ID 
NO: 91) ' 

GAGC; 5' -CCGGGTCGCGSXaCGGGCGGTACCG-3' (SEQ ID NO: 92) 
3 ' -CAGCGCTCGCCCGCCATGGCAGCT- 5 ' ( SEQ I D 

NO: 93) 



GCTT ; 5' -CCGGGTCGT££HGGGCGGTACCG-3' • (SEQ ID NO: 94) 
3'-CAGCACGAACCCGCCATGGCAGCT-5' (SEQ ID 

NO: 95) 

GACT; 5'-CCGGGTCGG£££TGGGCGGTACCG-3' (SEQ ID NO: 96) 
3 / -CAGCCCTGACCCGCCATGGCAGCT-5 / (SEQ ID 

NO:97) 



GAGT; 5' -CCGGGTCGGSAG1GGGCGGTACCG-3' (SEQ ID NO: 98) 
3'-CAGCCCTCACCCGCCATGGCAGCT-5' (SEQ ID 

NO: 99) 

ACAT; 5' -CCGGGTCGGMI&XGGGCGGTACCG-3' (SEQ ID NO: 100) 
3' -CAGCCTGTACCCGCCATGGCAGCT-5' (SEQ ID 

NO:101) 

Various amounts of a zinc finger protein were incubated with a labeled probe 
DNA for one hour at room temperature in 20 mM Tris pH 7.7, 120 mM NaCl, 5 mM 
MgCU, 20 \M ZnS0 4J 10% glycerol, 0.1% Nonidet P-40, 5 mM DTT, and 0.10 
mg/mL BSA (bovine serum albumin), and then the reaction mixtures were subjected to 
gel electrophoresis. The radioactive signals were quantitated by Phosphorlmager™ 
analysis (Molecular Dynamics), and dissociation constants (JQ were determined as 
described (Rebar and Pabo (1994) Science 263:671-673). The results are described in 
Table 2. All the constants were determined in at least two separate experiments, and the 
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standard error of the mean is indicated. Cell growth of yeast transform ants on histidine- 
deficient minimal medium (Fig. 10) is also indicated in Table 2. 

Table 2 



Zinc finger protein 


Probe DNA 


Dissociation ' 


Growth of 






Constant (nM) 


Yeast 


Zif268 


GCTT 


2.1 ± 0.3 


- 




GCGT 


0.024 + 0.004 


+++ 




GAGT 


0.17+ 0.04 


-H- 




GAGC 


2.3 ± 0.9 


- 




GACT 


4.9+ 0.6 


- 




ACAT 


1.3 + 0.3 


- 


KTNR 


GCGT 


5.5 + 0.7 


- 




GAGC 


0.17+ 0.01 


-H- 




GACT 


30 ± 1 


- 


CSNR 


GCGT 


2.7+ 0.3 


- 




GAGT 


0.46 + 0.04 


+-H- 




GAGC 


1.2+ 0.1 • 


-H- 




GACT 


0.17 + 0.01 


+++ 


HSNK 


GCGT 


42+ 14 


- 




GAGT 


3.5 + 0.1 


- 




GACT 


0.32 + 0.08 


++ 


RDER 


GCGT 


0.027 + 0.002 


■+++ 




GAGT 


0.18 ± 0.01 


-h+ 


- 


GACT 


28+ 9 


- 


SSNR 


GCGT 


3.8+ 1.3 






GAGC 


0.45 + 0.09 






GACT 


0.61 ± 0.21 ' " 




VSTR 


GCTT 


0.53 + 0.07 






GCGT 


0.76 + 0.22 






GAGT 


1.4+ 0.2 




QSTV 


GCTT 


29 ± 3 






GCGT 


9.8+ 3.4 






ACAT 


2.3 ± 0.4 





5 * +++, 20 to 1 00% growth; ++, 5 to 20% growth; +, 1 -5% growth; < 1 % 

growth. 



Zinc finger proteins that allowed cell growth on histidine-deficient plates 
bound the corresponding probe DNAs tightly. For example, the Zif268 protein used 
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as a control allowed cell growth with the GCGT and GAGT reporter plasmids, and 
the dissociation constants measured in vitro using corresponding probe DNAs were 
0.024 nM and 0.17 nM, respectively. In contrast, the Zi£268 protein did not allow cell 
growth with other plasmids, and the dissociation constants measured using - 
corresponding probe DNAs were higher than 1 nM. 

Zinc finger proteins containing novel zinc finger domains also showed similar 
results. For example, the KTNR protein showed strong affinity for the GAGC probe 
DNA, with a dissociation constant of 0.17 nM, but not for the GCGT or GACT probe 
DNA, with dissociation constants of 5.5 nM or 30 nM, respectively. This protein 
allowed cell growth only with the GAGC plasmid. The HSNK protein was able to 
bind the GACT probe DNA tightly (K d = 032 nM) but not the GCGT or GAGT probe 
DNA; as would be expected, the HSNK protein allowed cell growth only with the 
GAGT plasmid. 

The QSTV protein, which was selected with the ACAT reporter plasmid, was 
not able to promote cell growth with any of the other reporter plasmids when 
retransformed into yeast. Gel shift assays demonstrated that this protein bound the 
ACAT probe DNA more tightly than it did the other probe DNAs. That is, QSTV 
bound the ACAT probe DNA 1 3 times or 4.3 times stronger than it did the GCTT or 
GCGT probe DNA respectively. 

In general, when a zinc finger protein, e.g., having three zinc finger domains, 
binds a DNA sequence with a dissociation constant lower than 1 nM, it allows cell 
growth, whereas when a zinc finger protein binds a DNA sequence with a dissociation 
constant higher than 1 nM, it does not allow cell growth. Zinc finger proteins that 
bind with a dissociation constant of greater than 1 nM, but less than 5 nM can also be 
useful, e.g., in the context of a chimeric zinc finger protein having four zinc finger 
domains. 

Example 11: TG-ZFD-001 "CSNR1" 

TG-ZFD-001 "CSNR1" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is YKCKQCGKAFGCPSNLRRHGRTH 
(SEQ ED NO:23). It is encoded by the human nucleic acid sequence: 
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5'- 

TATAAATGTAAGCAATGTGGGAAAGCTTTTGGATGTCCCTCA 
GGC ATGGAAGGACTC AC-3 ' (SEQ ID NO:22). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-001 "CSNR1" 
5 demonstrates recognition specificity for the 3-bp target sequence sequences GAA, 
GAC, and GAG. Its binding site preference is GAA > GAC > GAG > GCG as 
determined by /;/ vivo screening results and EMS A. In EMS A, the TG-ZFD-001 
"CSNR" fusion to fingers 1 and 2 of Zif268 and the GST purification handles has an 
apparent K,, of 0.17 nM for the GAC containing site, 0.46 nM for the GAG containing 
10 site, and 2.7 nM for the GCG containing site. 

TG-ZFD-001 "CSNR1" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GAA, GAC, or GAG. 



15 Example 12: TG-ZFD-002 "HSNK" 

TG-ZFD-002 "HSNK" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YKCKECGKAFNHSSNFNKHHRIH 
(SEQ ID NO:25). It is encoded by the human nucleic acid sequence: 
5'- 

20 TATAAGTGTAAGGAGTGTGGGAAAGCCTTCAACCACAGCTCCAACTTCAATA 
AAC ACCAC AGAATCC AC-3 ' (SEQ ID NO:24). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-002 "HSNK" 
demonstrates recognition specificity for the 3-bp target sequence GAC. Its binding 
site preference is GAC > GAG > GCG as determined by in vivo screening results and 
25 EMS A. In EMSA, the TG-ZFD-002 "HSNK" fusion to fingers 1 and 2 of Zif268 and 
the GST purification handles has an apparent of 0.32 nM for the GAC containing 
site, 3.5 nM for the GAG containing site, and 42 nM for the GCG containing site. 

TG-ZFD-002 "HSNK" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
30 recognizing a DNA site containing the sequence GAC. 
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Example 13: TG-ZFD-003 "SSNR" 

TG-ZFD-003 "SSNR" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YECKECGKAFSSGSNFTRHQRIH 
(SEQ ID NO:27). It is encoded by the human nucleic acid sequence: 
5'- 

TATGAATGTAAGGAATGTGGGAAAGC 

GACATC AGAG AATTCAC-3 ' (SEQ ID NO:26). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-003 "SSNR" 
demonstrates recognition specificity for the 3-bp target sequence GAG. Its binding 
site preference is GAG > GAC > GCG as determined by in vivo screening results and 
EMSA. In EMSA, the TG-ZED-003 "SSNR" fusion to fingers 1 and 2 of Zif268 and 
the GST purification handles has an apparent of 0.45 nM for the GAG containing 
site, 0.61 nM for the GAC containing site, and 3.8 nM for the GCG containing site. 

TG-ZFD-003 "SSNR" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GAG, or GAC. 

Example 14: TG-ZFD-004 "RDER1" 

TG-ZFD-004 "RDER1" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: 

YVCDVEGCTWFARSDELNRHKKRH (SEQ ID NO:29). It is encoded by the 

human nucleic acid sequence: 

5'- 

TATGTATGCGATGTAGAGGGATGTACGTGGAAATTTGCCCGCTCAGATGAGC 
TC AACAGAC ACAAGAAAAGGCAC-3 ' (SEQ ID NO:28). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-004 "RDERl" 
demonstrates recognition specificity for the 3-bp target sequence GCG. Its binding 
site preference is GCG > GTG, GAG > GAC as determined by in vivo screening 
results and EMSA. In EMSA, the TG-ZFD-004 "RDER1" fusion to fingers 1 and 2 
of Zif268 and the GST purification handles has an apparent Kj of 0.027 nM for the 
GCG containing site, 0.18 nM for GAG containing site, and 28 nM for the GAC 
containing site. 
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TG-ZFD-004 "RDERl" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GCG, GTG or GAG. 



5 Example 15: TG-ZFD-005 "QSTV" 

TG-ZFD-005 "QSTV" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YECNECGKAFAQNSTLRVHQRIH 
(SEQ ID NO:31). It is encoded by the human nucleic acid sequence: 
5'- 

1 o TATG AGTGTAATGAATGCGGG AAAGCTTTTGCCC AAAATTC AACTCTCAGAG 

TAC ACC AGAGAATTC AC-3 ' (SEQ ID NO:30). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-005 "QSTV" 

demonstrates recognition specificity for the 3-bp target sequence ACA. Its binding » 

site preference is ACA > GCG > GCT as determined by EMSA. In EMSA, the TG- J 
15 ZFD-005 "QSTV" fusion to fingers 1 and 2 of Zif268 and the GST purification 

handles has an apparent K^, of 2.3 nM for the ACA containing site, 9.8 nM for the 

GCG containing site, and 29 nM for the GCT containing site. 

TG-ZFD-005 "QSTV" can be used as a module to construct a chimeric DNA 

binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
20 recognizing a DNA site containing the sequence ACA. - 

Example 16: TG-ZFD-006 "VSTR" 

TG-ZFD-006 "VSTR" was identified by in vivo screening'from human 
genomic sequence. Its amino acid sequence is: YECNYCGKTFSVSSTLIRHQRIH 
(SEQ ID NO:33) . It is encoded by the human nucleic acid sequence: 

25 5'- 

TATGAGTGTAATTACTGTGGAAAAACCTTTAGTGTG 
GAC ATC AG AG AATCCAC-3 ' (SEQ ID NO:32). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-006 "VSTR" 
demonstrates recognition specificity for the 3-bp target sequence GCT. Its binding 
30 site preference is GCT > GCG > GAG as determined by in vivo screening results and 
EMSA. In EMSA, the TG-ZFD-006 "VSTR" fusion to fingers 1 and 2 of Zif268 and 
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the GST purification handles has an apparent K„ of 0.53 nM for the GCT containing 
site, 0.76 for the GCG containing site, and 1.4 nM for the GAG containing site. 

TG-ZFD-006 "VSTR" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose- of 
5 recognizing a DNA site containing the sequence GCT or GCG. 

Example 17: TG-ZFD-007 "CSNR2" 

TG-ZFD-007 "CSNR2" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YQCNICGKCFSCNSNLHRHQRTH 
(SEQ ID NO:35). It is encoded by the human nucleic acid sequence: 
10 5'-TATCAGTGCAACATTTGCGGAAAATGTTTCTCCTGCAACTCCAACCTCCA 
CAGGCACCAGAGAACGCAC -3' (SEQ ID NO:34). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-007 "CSNR2" 
demonstrates recognition specificity for 3-bp target sequences GAA, GAC, and GAG. 
Its binding site preference is GAA > GAC > GAG as determined by in vivo screening- 
15 results. 

TG-ZFD-007 "CSNR2" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GAA, GAC, or GAG. 



20 Example 18: TG-ZFD-008 "QSHR1" 

TG-ZFD-008 "QSHR1" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YACHLCGKAFTQS SHLRRHEKTH 
(SEQ ID NO:37). It is encoded by the human nucleic acid sequence: 
5 ' -TATGC ATGTC ATCTATGTGGA AAAGCCTTC ACTC AG AGTTCTCACCTTAG 
25 AAGACATGAGAAAACTCAC -3' (SEQ ID NO:36). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-008 "QSHR1" 
demonstrates recognition specificity for 3-bp target sequences GGA, GAA, and AGA. 
Its binding site preference is GGA > GAA > AGA as determined by in vivo screening 
results. 

30 TG-ZFD-008 "QSHR1 " can be used as a module to construct a chimeric DNA 

binding protein comprising multiple zinc finger domains, e.g., for the purpose of 

recognizing a DNA site containing the sequence GGA, GAA, or AGA. 

-53- 



WO 01/60970 



PCT/KR01/00244 



Example 19: TG-ZFD-009 "QSHR2" 

TG-ZFD-009 "QSHR2" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YKCGQCGKFYSQVSHLTRHQKIH 
(SEQ ID NO:39). It is encoded by the human nucleic acid sequence: 

S'-TATAAATGCGGCCAGTGTGGGAAGTTCTACTCGCAGGTCTCCCACCTCA 
CCCGCCACCAGAAAATCCAC -3' (SEQ ID NO:38). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-009 "QSHR2" 
demonstrates recognition specificity for the 3-bp target sequence GGA. 

TG-ZFD-009 "QSHR2" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GGA. 

Example 20: TG-ZFD-010 "QSHR3" 

TG-ZFD-010 "QSHR3" was identified by in vivo screening from human . 
genomic sequence. Its amino acid sequence is: YACHLCGKAFTQCSHLRRHEKTH 
(SEQ ID NO:41). It is encoded by the human nucleic acid sequence: 
S'-TATGCATGTCATCTATGTGGAAAAGCCTTCACTCAGTGTTCTCACCTTAG 
AAGACATGAGAAAACTCAC-3' (SEQ ID NO:40). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-010 "QSHR3" 
demonstrates recognition specificity for 3-bp target sequences GGA and GAA. Its 
binding site preference is GGA > GAA as determined by in v/vo screening results. 

TG-ZFD-010 "QSHR3" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GGA or GAA. 

Example 21 : TG-ZFD-011 "QSHR4" 

TG-ZFD-01 1 "QSHR4" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YACHLCAKAFIQCSHLRRHEKTH 
(SEQ ID NO:43). It is encoded by the human nucleic acid sequence: 
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5 5 -TATGC ATGTC ATCTATGTGC AAA AGCCTTC ATTCAGTGTTCTCA 
AGACATGAGAAAACTCAC -3' (SEQ ED NO:42). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-01 1 "QSHR4" 
demonstrates recognition specificity for 3-bp target sequences GGA and GAAr Its 
binding site preference is GGA > GAA as determined by in vivo screening results. 

TG-ZFD-01 1 "QSHR4" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GGA or GAA. 

Example 22: TG-ZFD-012 "QSHR5" 

TG-ZFD-01 2 "QSHR5" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YVCRECGRGFRQHSHLVRHKRTH 
(SEQ ID NO:45). It is encoded by the human nucleic acid sequence: 

S'-TATGTTTGCAGGGAATGTGGGCGTGGCTTTCGCCAGCATTCACACCTGGT 
CAGACACAAGAGGACACAT -3* (SEQ ID NO:44). 

As a polypeptide fusion to fingers 1 and 2 of Zi£268, TG-ZFD-012 "QSHR5" 
demonstrates recognition specificity for 3-bp target sequences GGA, AGA, GAA, and 
CGA. Its binding site preference is GGA > AGA > GAA > CGA as determined by in 
vivo screening results. 

TG-ZFD-012 "QSHR5" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GGA, AGA, GAA, or CGA. 

Example 23: TG-ZFD-01 3 "QSNR1" 

TG-ZFD-01 3 "QSNR1" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: FECKDCGKAFIQKSNLIRHQRTH 
(SEQ ID NO:47). It is encoded by the human nucleic acid sequence: 
S'-TTTGAGTGTAAAGATTGCGGGAAAGCTTTCATTCAGAAGTCAAACCTCA 
TC AGAC ACCAGAGAACTC AC-3 * (SEQ ID NO:46). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-01 3 "QSNR1" 
demonstrates recognition specificity for the 3-bp target sequence GAA. 
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TG-ZFD-013 "QSNR1" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GAA. 

Example 24: TG-ZFD-014 "QSNR2" 

TG-ZFD-014 "QSNR2" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YVCRECRRGFSQKSNLIRHQRTH 
(SEQ ID NO:49). It is encoded by the human nucleic acid sequence: 
5'-TATGTCTGCAGGGAGTGTAGGCGAGGTTTTAGCCAGAAGTCAAATCTCA 
TCAG AC ACC AGAGGACGCAC-3 ' (SEQ ID NO:48). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-014 "QSNR2" 
demonstrates recognition specificity for the 3-bp target sequence GAA. 

TG-ZFD-014 "QSNR2" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GAA. 

Example 25: TG-ZFD-015 "QSNV1" 

TG-ZFD-015 "QSNV1" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YECNTCRKTFSQKSNLIVHQRTH 
(SEQ ID NO:51): It is encoded by the human nucleic acid sequence: 
S'-TATGAATGTAACACATGCAGGAAAACCTTCTCTCAAAAGTCAAATCTCAT 
TGTAC ATCAGAG AAC ACAC-3 ' (SEQ ED NO:50). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-015 "QSNV1" 
demonstrates recognition specificity for 3-bp target sequences AAA and CAA. Its 
binding site preference is AAA > CAA as determined by in vivo screening results. 

TG-ZFD-015 "QSNV1" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence AAA or CAA. 
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Example 26: TG-ZFD-016 "QSNV2" 

TG-ZFD-016 "QSNV2" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YVCSKCGKAFTQSSNLTVHQKIH 
(SEQ ID NO:53). It is encoded by the human nucleic acid sequence: 
S'-TATGTTTGCTCAAAATGTGG^ 
TGTACATCAAAAAATCCAC -3' (SEQ ID NO:52). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-016 "QSNV2" 
demonstrates recognition specificity for 3-bp target sequences AAA and CAA. Its 
binding site preference is AAA > CAA as determined by in vivo screening results. 

TG-ZFD-016 "QSNV2" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence AAA or CAA. 

Example 27: TG-ZFD-017 "QSNV3" 

TG-ZFD-017 "QSNV3" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YKCDECGK^^^QSSNLWHKRIH 
(SEQ ED NO:55). It is encoded by the human nucleic acid sequence: 

S'-TACAAATGTGACGAATGTGGAAAAAACTTTACCCAGTCCTCCAACCTTA 
TTGTACATAAGAGAATTCAT -3' (SEQ ID NO:54). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-017 "QSNV3" 
demonstrates recognition specificity for a 3-bp target sequence AAA. 

TG-ZFD-017 "QSNV3" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence AAA. 

Example 28: TG-ZFD-018 "QSNV4" 

TG-ZFD-018 "QSNV4" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: 

YECDVCGKTFTQKSNLGVHQRTH (SEQ ID NO:57). It is encoded by the human 
nucleic acid sequence: 
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5'-TATGAATGTGATGTGTGTGGAAAAACCTTCACGCAAAAGTCAAACCTTG 
GTGTACATCAGAGAACTCAT -V (SEQ ID NO:56). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-01 8 "QSNV4" 
demonstrates recognition specificity for the 3-bp target sequence AAA. 
5 TG-ZFD-01 8 "QSNV4" can be used as a module to cohstruct a chimeric DNA 

binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence AAA. 

Example 29: TG-ZFD-019 "QSSR1" 

10 TG-ZFD-01 9 "QSSRl" was identified by in vivo screening from human 

genomic sequence. Its amino acid sequence is: YKCPDCGKSFSQSSSLIRHQRTH 
(SEQ ID NO:59). It is encoded by the human nucleic acid sequence: 

5'-TATAAGTGCCCTGATTGTGGGAAGAGTTTTAGTCAGAGTTCCAGCCTCAT 

TCGCCACCAGCGGACACAC-3' (SEQ ID NO:58). 
15 As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-019 "QSSR1" 

demonstrates recognition specificity for 3-bp target sequences GTA and GCA. Its 

binding site preference is GTA > GCA as determined by in vivo screening results. 

TG-ZFD-019 "QSSR1" can be used as a module to construct a chimeric DNA 

binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
20 recognizing a DNA site containing the sequence GTA or-GCA. 

Example 30: TG-ZFD-020 "QSSR2" - . 

TG-ZFD-020 "QSSR2" was identified by in vivo screening from human 

genomic sequence. Its amino acid sequence is: YECQDCGRAFNQNSSLGRHKRTH 

25 (SEQ ID NO:6 1). It is encoded by the human nucleic acid sequence: 

S'-TATGAGTGTCAGGACTGTGGGAGGGCCTTCAACCAGAACTCCTCCCTGG 

GGCGGCACAAGAGGACACAC-3* (SEQ ID NO:60). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-020 "QSSR2" 

demonstrates recognition specificity for the 3-bp target sequence GTA. 

30 TG-ZFD-020 "QSSR2" can be used as a module to construct a chimeric DNA 

binding protein comprising multiple zinc finger domains, e.g., for the purpose of 

recognizing a DNA site containing the sequence GTA. 
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Example 31: TG-ZFD-021 "QSTR" 

TG-ZFD-021 "QSTR" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YKCEECGK^NQSSTLTRHKIVH 
(SEQ ID NO:63). It is encoded by the human nucleic acid sequence: 
S'-TACAAATGTGAAGAATGTGGC 
CTAGAC ATAAGATAGTTC AT-3 ' (SEQ ID NO:62). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-021 "QSTR" 
demonstrates recognition specificity for 3-bp target sequences GTA and GCA. Its 
binding site preference is GTA > GCA as determined by in vivo screening results. 

TG-ZFD-021 "QSTR" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GTA or GCA. 

Example 32: TG-ZFD-022 "RSHR" 

TG-ZFD-022 "RSHR" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YKCMECGKAFNRRSHLTRHQRIH 
(SEQ ID NO:65). It is encoded by the human nucleic acid sequence: 

S^TATAAGTGCATGGAGTGTGGGAAGGCnTTAACCGCAGGTCACACCTCA 
CACGGC ACC AGCGGATTCAC-3 ' (SEQ ID NO:64). 

As a polypeptide fusion to fingers 1 and 2 of Zi£268, TG-ZFD-022 "RSHR" 
demonstrates recognition specificity for the 3-bp target sequence GGG. 

TG-ZFD-022 "RSHR" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GGG. 

Example 33: TG-ZFD-023 "VSSR" 

TG-ZFD-023 "VSSR" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YTCKQCGKAFSVSSSLRRHETTH 
(SEQ ID NO:67). It is encoded by the human nucleic acid sequence: 
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S'-TATACATGTAAACAGTGTGGGAAAGCCTTCAGTGTTTCCAGTTCCCTTCG 
AAGACATGAAAGCACTCAC-3' (SEQ ID NO:66). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-023 "VSSR" 
demonstrates recognition specificity for 3-bp target sequences GTT, GTG, and-GTA. 
5 Its binding site preference is GTT > GTG > GTA as determined by in vivo screening 
results. 

TG-ZFD-023 "VSSR" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GTT, GTG, or GTA. 

10 

Example 34: TG-ZFD-024 "QAHR" 

TG-ZFD-024 "QAHR" was identified by in vivo screening from human 

genomic sequence. Its amino acid sequence is: YKCKECGQAJRQRAHLIRHHKLH 

(SEQ ID NO: 1 03). It is encoded by the human nucleic acid sequence: 
15 5'-TATAAGTGTAAGGAATGTGGGCAGGCCTTTAGACAGCGTGCACATCTT 

ATTCGACATCACAAACTTCAC-3 * (SEQ ID NO:102). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-024 "QAHR" 

demonstrates recognition specificity for the 3-bp target sequence GGA as determined 

by in vivo screening results. 
20 TG-ZFD-024 "QAHR" can be used as a module to construct a chimeric DNA 

binding protein comprising multiple zinc finger domains, e.g., for the purpose of 

recognizing a DNA site containing the sequence GGA 

Example 35: TG-ZFD-025 "QFNR" 

25 TG-ZFD-025 "QFNR" was identified by in vivo screening from human 

genomic sequence. Its amino acid sequence is: YKCHQCGKAFIQSFNLRRHERTH 
(SEQ ID NO: 1 05). It is encoded by the human nucleic acid sequence: 
5'-TATAAGTGTCATCAATGTGGGAAAGCCTTTATTCAATCCTTTAACCTTC 
G A AGAC ATG AG AG AACTC AC-3 * (SEQ ID NO:104). 

30 As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-025 "QFNR" 

demonstrates recognition specificity for the 3-bp target sequence GAC as determined 
by in vivo screening results. 
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TG-ZFD-025 "QFNR" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GAC. 

Example 36: TG-ZFD-026 "QGNR" 

TG-ZFD-026 "QGNR" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: FQCNQCGASFTQKGNLLRHIKLH 
(SEQ ID NO: 107). It is encoded by the human nucleic acid sequence: 

5'-TTCCAGTGTAATCAGTGTGGGGCATCTTTTACTCAGAAAGGTAACCTCC 
TCCGCCACATTAAACTGCAC-3' (SEQ ID NO: 106). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-026 "QGNR" 
demonstrates recognition specificity for the 3-bp target sequence GAA as determined 
by in vivo screening results. 

TG-ZFD-026 "QGNR" can be used as a module to construct a chimeric DNA- 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GAA. 

Example 37: TG-ZFD-028 "QSHT" 

TG-ZFD-028 "QSHT" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YKCEECGKAFRQSSHLTTHKIIH 
(SEQ ID NO:l 1 1). It is encoded by the human nucleic acid sequence: 

S^-TACAAATGTGAAGAATGTGGCAAAGCCTTTAGGCAGTCCTCACACOT 
CTAC AC AT AAGATAATTCAT-3 * (SEQ ID NO:l 10). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-028 "QSHT" 
demonstrates recognition specificity for the 3-bp target sequence AGA, CGA, TGA, 
and GGA. Its binding site preference is.(AGA and CGA) > TGA > GGA as 
determined by in vivo screening results. 

TG-ZFD-028 "QSHT" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence AGA, CGA, TGA, and GGA. 
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Example 38: TG-ZFD-029 "QSHV" 

TG-ZFD-029 "QSHV" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YECDHCGKSFSQSSHLNVHKRTH 
(SEQ ID NO: 113). It is encoded by the human nucleic acid sequence: 
S'-TATGAGTGTGATCACTGTGGAAAATCCTTTAGCCAGAGCTCTCATCTGAA 
TGTGCAC AAAAGAACTCAC-3 ' (SEQ ID NO: 11 2). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-029 "QSHV" 
demonstrates recognition specificity for the 3-bp target sequence CGA, AGA, and 
TGA. Its binding site preference is CGA > AGA > TGA as determined by in vivo 
screening results. 

TG-ZFD-029 "QSHV" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence CGA, AGA, and TGA. 

Example 39: TG-ZFD-030 "QSNI" 

TG-ZFD-030 "QSNI" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YMCSECGRGFSQKSNLIIHQRTH 
(SEQ ED NO:l 15). It is encoded by the human nucleic acid sequence: 
S'-TACATGTGCAGTGAGTGTGGGCGAGGCTTCAGCCAGAAGTCAAACCTC 
ATCATACACCAGAGGACACAC-3* (SEQ ID NO:114). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-030 "QSNI" 
demonstrates recognition specificity for the 3-bp target sequence AAA and CAA as 
determined by in vivo screening results. 

TG-ZFD-030 "QSNI" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence AAA or CAA. 

Example 40: TG-ZFD-031 "QSNR3" 

TG-ZFD-031 U QSNR3" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YECEKCGKAFNQSSNLTRHKKSH 
(SEQ ID NO:l 17). It is encoded by the human nucleic acid sequence: 
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5 5 -TATGAATGTGAAAAATGTGGCAAAGCTTTTAACCAGTCCTCAA^ 
CTAGAC ATAAGAAAAGTCAT-3 ' (SEQ ID NO: 11 6). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-031 "QSNR3" 
demonstrates recognition specificity for the 3-bp target sequence GAA as determined 
by in vivo screening results. 

TG-ZFD-031 "QSNR3" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GAA. 

Example 41: TG-ZFD-032 "QSSR3" 

TG-ZFD-032 "QSSR3" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is:YECNECGKFFSQSSSLIRHRJRSH 
(SEQ ID NO:l 19). It is encoded by the human nucleic acid sequence: 

5'-TATGAGTGCAATGAATGTGGGAAGTTTTTTAGCCAGAGCTCCAGCCTC^ 
TTAGACATAGGAGAAGTCAC-3 ' (SEQ ID NO:l 18). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-032 n QSSR3 M 
demonstrates recognition specificity for the 3-bp target sequence GTA and GCA. Its 
binding site preference is GTA > GCA as determined by in vivo screening results. 

TG-ZFD-032 "QSSR3" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains,_e.g., for the purpose of 
recognizing a DNA site containing the sequence GTA or GCA. 

Example 42: TG-ZFD-033 "QTHQ" 

TG-ZFD-033 "QTHQ" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YECHDCGKSFTtQSTHLTQHRRIH 
(SEQ ID NO: 121). It is encoded by the human nucleic acid sequence: 
S'-TATGAGTGTCACGATTGCGGAAAGTCCTTTAGGCAGAGCACCCACCTCA 
CTCAGCACCGGAGGATCCAC-3' (SEQ ID NO:120). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-033 "QTHQ" 
demonstrates recognition specificity for the 3-bp target sequence AGA, TGA, and 
CGA. Its binding site preference is AGA > (TGA and CGA) as determined by in vivo 
screening results. 
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TG-ZFD-033 "QTHQ" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence AGA, TGA, and CGA. 

Example 43: TG-ZFD-034 "QTHR1" 

TG-ZFD-034 "QTHR1" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YECHD C GKSFRQSTHLTRHRRIH 
(SEQ ID NO: 123). It is encoded by the human nucleic acid sequence: 
5'-TATGAGTGTCACGATTGCGGAAAGTCCTTTAGGCAGAGCACCCACCTCA 
CTCGGC ACCGGAGGATCCAC-3 ' (SEQ ID NO: 122). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-034 "QTHR1" 
demonstrates recognition specificity for the 3 -bp target sequence GGA, GAA, and 
AGA . Its binding site preference is GGA > (GAA and AGA) as determined by in 
vivo screening results. 

TG-ZFD-034 "QTHR1" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GGA, GAA, and AGA. 

Example 44: TG-ZFD-035 "QTHR2" 

TG-ZFD-035 "QTHR2" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: HKCLECGKCFSQNTHLTRHQRT 
(SEQ ID NO: 125). It is encoded by the human nucleic acid sequence: 
S'-CACAAGTGCCTTGAATCTGGGAAATGCTTCAGTCAGAACACCCATCTG 
ACTCGCCACCAACGCACCCAC-3' (SEQ ID NO: 124). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-035 "QTHR2" 
demonstrates recognition specificity for the 3-bp target sequence GGA as determined 
by in vivo screening results. 

TG-ZFD-035 "QTHR2" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GGA. 
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Example 45: TG-ZFD-036 "RDER2" 

TG-ZFD-036 tt RDER2" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: 

YHCDWDGCGWKFARSDELTRHYRKH (SEQ ID NO:127). It is encoded "By the 
human nucleic acid sequence: 

S'-TACCACTGTGACTGGGACGGCTGTGGATGGAAATTCGCCCGCTCAGAT 
GAACTGACCAGGCACTACCGTAAACAC-3 ' (SEQ ID NO: 126). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-036 "RDER2" 
demonstrates recognition specificity for the 3-bp target sequence GCG and GTG. Its 
binding site preference is GCG > GTG as determined by in vivo screening results. 

TG-ZFD-036 "RDER2" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GCG and GTG. 

Example 46: TG-ZFD-037 "RDER3" 

TG-ZFD-037 "RDER3" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: 

YRCSWEGCEWRFARSDELTRHFRKH (SEQ ID NO:129). It is encoded by the 
human nucleic acid sequence: 

S^TACAGATGCTCATGGGAAGGGTGTGAGTGGCGTTTTGCAAGAAGTGAT 
GAGTTAACCAGGCACTTCCGAAAGCAC-3 ' (SEQ ID NO:128). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-037 "RDER3" 
demonstrates recognition specificity for the 3-bp target sequence GCG and GTG as 
determined by in vivo screening results. 

TG-ZFD-037 "RDER3" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GCG and GTG. 

Example 47: TG-ZFD-038 "RDER4" 

TG-ZFD-038 te RDER4" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: 



-65- 



WO 01/60970 PCT/KR01/00244 

FSCSWKGCERRFARSDELSRHRRTH (SEQ ID NO: 131). It is encoded by the 
human nucleic acid sequence: 

S'-TTCAGCTGTAGCTGGAAAGGTTGTGAAAGGAGGTTTGCCCGTTCTGATG 
AACTGTCC AGACACAGGCGAACCCAC-3 ' (SEQ ID NO: 130): 
5 As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-038 "RDER4" 

demonstrates recognition specificity for the 3-bp target sequence GCG and GTG as 
determined by in vivo screening results. 

TG-ZFD-038 "RDER4" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
10 recognizing a DNA site containing the sequence GCG and GTG. 

Example 48: TG-ZFD-039 "RDER5" 

TG-ZFD-039 "RDER5" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: 
1 5 FACSWQDCNKKFARSDELARHYRTH (SEQ ID NO: 1 33). It is encoded by the 
human nucleic acid sequence: 

5'-TTCGCCTGCAGCTGGCAGGACTGCAACAAGAAGTTCGCGCGCTCCGAC 
GAGCTGGCGCGGC ACTACCGCACAC AC-3 * (SEQ ID NO: 132). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-039 "RDER5" 
20 demonstrates recognition specificity for the 3-bp target sequence GCG as determined 
by in vivo screening results. 

TG-ZFD-039 "RDER5" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GCG. 

25 

Example 49: TG-ZFD-040 "RDER6" 

TG-ZFD-040 "RDER6" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: 

YHCNWDGCGWKFARSDELTRHYRKH (SEQ ID NO: 1 35). It is encoded by the 

30 human nucleic acid sequence: 

5*-TACCACTGCAACTGGGACGGCTGCGGCTGGAAGTTTGCGCGCTCAGAC 

GAGCTCACGCGCCACTACCGAAAGCAC-3' (SEQ ID NO: 134). 
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As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-040 "RDER6" 
demonstrates recognition specificity for the 3-bp target sequence GCG and GTG. Its 
binding site preference is GCG > GTG as determined by in vivo screening results. 

TG-ZFD-040 "RDER6" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., 'for the purpose of 
recognizing a DNA site containing the sequence GCG and GTG. 



Example 50: TG-ZFD-041 "RDHR1" 

TG-ZFD-041 "RDHR1" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: 

FLCQYCAQRFGRKDHLTRHMKKSH (SEQ ID NO: 137). It is encoded by the 
human nucleic acid sequence: 

5 5 -TTCCTCTGTCAGTATTGTGC AC AGAGATTTGGGCGAAAGGATC ACCTGA 
CTCGACATATGAAGAAGAGTCAC-3 ' (SEQ ID NO: 136). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-041 "RDHR1" 
demonstrates recognition specificity for the 3-bp target sequence GAG and GGG as 
determined by in vivo screening results. 

TG-ZFD-041 "RDHR1" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GAG and GGG. 

Example 51 : TG-ZFD-043 "RDHT" 

TG-ZFD-043 "RDHT" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: FQCKTCQRKFSRSDHLKTHTRTH 
(SEQ ID NO: 141). It is encoded by the human nucleic acid sequence: 

5 ' -TTCCAGTGTAAAACTTGTCAGCGAAAGTTCTCCCGGTCCGACC ACCTGA 
AGACCC AC ACC AGGACTCAT-3 * (SEQ ID NO: 140). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-043 "RDHT" 
demonstrates recognition specificity for the 3-bp target sequence TGG, AGG, CGG, 
and GGG as determined by in vivo screening results. 
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TG-ZFD-043 "RDHT" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence TGG, AGG, CGG, and GGG. 

5 Example 52: TG-ZFD-044 "RDKI" 

TG-ZFD-044 "RDKI" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: F ACE VC G VRPTRNDKLKIHMRiffl 
(SEQ ID NO: 143). It is encoded by the human nucleic acid sequence: 
5 * -TTTGCCTGC G AGGTCTGCGGTGTTCG ATTC ACC AGGAACG AC AAGCTGA 
10 AGATCCACATGCGGAAGCAC-3' (SEQ ID NO:142). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-044 "RDKI" 
demonstrates recognition specificity for the 3-bp target sequence GGG as determined 
by in vivo screening results. 

TG-ZFD-044 "RDKI" can be used as a module to construct a chimeric DNA' - 
15 binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GGG. 

Example 53: TG-ZFD-04S "RDKR" 

TG-ZFD-045 "RDKR" was identified by in vivo screening from human 
20 genomic sequence. Its amino acid sequence is: 

YVCDVEGCTWKFARSDKLNRHKKRH (SEQ ID NO: 145). It is encoded by the 
human nucleic acid sequence: 

5'-TATGTATGCGATGTAGAGGGATGTACGTGGAAATTTGCCCGCTCAGATA 
AGCTC AAC AGAC ACAAGAAAAGGCAC-3 * (SEQ ID NO:144). 

25 As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-045 "RDKR" 

demonstrates recognition specificity for the 3-bp target sequence GGG and AGG. Its 
binding site preference is GGG > AGG as determined by in vivo screening results. 

TG-ZFD-045 "RDKR" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 

30 recognizing a DNA site containing the sequence GGG and AGG. 
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Example 54 : TG-ZFD-046 "RSNR" 

TG-ZFD-046 "RSNR" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YICRKCGRGFSRKSNLIRHQRTH 
(SEQ ID NO: 1 47). It is encoded by the human nucleic acid sequence: 
5'-TATATTTGCAGAAAGTGTGGACGGGG 
TC AGACATC AGAGGACACAC-3 * (SEQ ID NO: 1 46). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-046 tc RSNR" 
demonstrates recognition specificity for the 3-bp target sequence GAG and GTG. Its 
binding site preference is GAG > GTG as determined by in vivo screening results. 

TG-ZFD-046 "RSNR" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GAG and GTG. 

Example 55 : TG-ZFD-047 "RTNR" 

TG-ZFD-047 "RTNR" was identified by in vivo screening from human 
genomic sequence. Its amino acid sequence is: YLCSECDKCFSRSTNLIRHRRTH 
(SEQ ID NO: 149). It is encoded by the human nucleic acid sequence: 

5'-TATCTATGTAGTGAGTGTGACAAATGCTTCAGTAGAAGTACAAACCTCA 
T AAGGC ATCGAAGAACTC AC-3 9 (SEQ ID NO:148). 

As a polypeptide fusion to fingers 1 and 2 of Zif268, TG-ZFD-047 "RTNR" 
demonstrates recognition specificity for the 3-bp target sequence GAG as determined 
by in vivo screening results. 

TG-ZFD-047 "RTNR" can be used as a module to construct a chimeric DNA 
binding protein comprising multiple zinc finger domains, e.g., for the purpose of 
recognizing a DNA site containing the sequence GAG. 

A number of embodiments of the invention have been described. Nevertheless, 
it will be understood that various modifications may be made without departing from 
the spirit and scope of the invention. Accordingly, other embodiments are within the 
scope of the following claims. 
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WHAT IS CLAIMED IS: 

1 . A method of identifying a zinc finger domain that recognizes a target site 
on a DNA, the method comprising: 

(a) providing cells containing a reporter construct, the construct 

5 comprising a reporter gene operably linked to a promoter, wherein the reporter gene is 
expressed above a given level when a transcription factor recognizes both a 
recruitment site and a target site of the promoter, but not when the transcription factor 
recognizes only the recruitment site of the promoter; 

(b) providing a plurality of hybrid nucleic acids, each of which 
10 encodes a non-naturally occurring protein comprising (i) a transcription activation 

domain, (ii) a DNA binding domain that recognizes the recruitment site, and (iii) a 
test zinc finger domain, wherein the encoded amino acid sequence of the test zinc 
finger domain varies among the members of the plurality; 

(c) contacting the plurality of hybrid nucleic acids with the cells 
1 5 under conditions that permit at least one of the plurality of nucleic acids to enter at 

least one of the cells; 

(d) maintaining the cells under conditions permitting expression of 
the hybrid nucleic acids in the cells; and 

(e) identifying a cell that contains a hybrid nucleic acid of (b) and 
20 that expresses the reporter gene above the given level as an indication that the cell 

contains a hybrid nucleic acid encoding a test zinc finger domain that recognizes the 
target site. 

2. The method of claim 1, wherein the cells are eukaryotic cells. 

3. The method of claim 2, wherein the cells are yeast cells. 

25 4. The method of claim 3, wherein the cells are Saccharomyces cerevisiae 

cells. 

5. The method of claim 1, wherein the reporter gene is a selectable marker. 

6. The method of claim 5, wherein the selectable marker is selected from the 
group consisting of URA3, HIS3, LEU2, ADE2, and TRPL 
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7. The method of claim 1, wherein the reporter gene is selected from the group 
consisting of lacZ, CAT, luciferase, GUS, and GFP. 



8. The method of claim 1, wherein the DNA binding domain comprises a zinc 
finger domain. 

5 9. The method of claim 8, wherein the DNA binding domain comprises two 

zinc finger domains. 

10. The method of claim 9, wherein the DNA binding domain comprises three 
zinc finger domains. 

11. The method of claim 1, further comprising the steps of (i) amplifying a 

10 source nucleic acid encoding the test zinc finger domain from genomic nucleic acid, a 
messenger RNA (mRNA) mixture, or a complementary DNA (cDNA) mixture, using 
an oligonucleotide primer that anneals to a sequence encoding a conserved domain 
boundary to produce an amplified fragment; and (ii) utilizing the amplified fragment 
to construct a hybrid nucleic acid for inclusion in the plurality of hybrid nucleic acids 

15 of step (b). 

12. The method of claim 1, further comprising the steps of (i) identifying a 
candidate zinc finger domain amino acid sequence in a sequence database; (ii) 
providing a candidate nucleic acid encoding the candidate zinc finger domain amino 
acid sequence, and (iii) utilizing the candidate nucleic acid to construct a hybrid 

20 nucleic acid for inclusion in the plurality of hybrid nucleic acids of step (b). 

13. The method of claim 5, wherein the selectable marker is an auxotrophy 
gene required for the synthesis of a metabolite; the genome of the cells lacks a 
functional copy of the auxotrophy gene; and, during step (d), the cells are maintained 
in a medium prepared without the metabolite. 

25 14. The method of claim 1, wherein steps (a) to (e) are repeated to identify a 

second test zinc finger domain that recognizes a second target site. 
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15. The method of claim 14, further comprising constructing a nucleic acid 
encoding a polypeptide comprising the first test zinc finger domain and the second 
test zinc finger domain. 

16. A method of identifying a zinc finger domain that recognizes a target site 
5 on a DNA, the method comprising: 

(a) providing cells containing a reporter construct, the construct 
comprising a reporter gene operably linked to a promoter, wherein the reporter gene is 
expressed above a given level when a transcription factor recognizes both a 
recruitment site and a target site of the promoter, but not when the transcription factor 

10 recognizes only the recruitment site of the promoter; 

(b) amplifying a plurality of nucleic acid sequences, each of which 
encodes a lest zinc finger domain, using an oligonucleotide primer that anneals to a 
nucleic acid encoding a conserved domain boundary; 

(c) joining each nucleic acid sequence of (b) to nucleic acid 
15 sequences encoding (i) a transcription activation domain, and (ii) a DNA binding 

domain that recognizes the recruitment site, to form a plurality of hybrid nucleic 
acids; 

(d) contacting the plurality of hybrid nucleic acids of (c) with the 
cells of (a) under conditions that permit at least one of the plurality of hybrid nucleic 

20 acids to enter at least one of the cells; 

(e) maintaining the cells under conditions permitting expression of 
the hybrid nucleic acids in the cells; and 

(f) identifying a cell that contains a hybrid nucleic acid of (c) and 
that expresses the reporter gene above the given level, wherein the hybrid nucleic acid 

25 encodes a zinc finger domain that recognizes the target site on a DNA. 

17. The method of claim 16, wherein the cells are yeast cells. 
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18. The method of claim 16, wherein the reporter gene is selected from the 
group consisting of lacZ 9 CAT, luciferase, GUS, and GFR 

19. The method of claim 16, wherein the DNA binding domain comprises a 
zinc finger domain. 

5 20. The method of claim 19, wherein the DNA binding domain comprises two 

zinc finger domains. 

21 . A method of determining whether a test zinc finger domain recognizes a 
target site on a promoter, the method comprising: 

(a) providing a reporter construct comprising a reporter gene 

10 operably linked to a promoter, wherein the reporter gene is expressed above a given 
level when a transcription factor recognizes both a recruitment site and a target site of 
the promoter, but not when the transcription factor recognizes only the recruitment 
site of the promoter; 

(b) providing a hybrid nucleic acid that encodes a non-naturally 

15 occurring protein comprising (i) a transcription activation domain, (ii) a DNA binding 
domain that recognizes the recruitment site, and (iii) a test zinc finger domain; 

(c) contacting the reporter construct with a cell under conditions 
that permit the reporter construct to enter the cell; 

(d) prior to, after, or concurrent with step (c), contacting the hybrid 
20 nucleic acid with the cell under conditions that permit the hybrid nucleic acid to enter 

the cell; 

(e) maintaining the cell under conditions permitting expression of 
the hybrid nucleic acid in the cell; and 

(f) detecting reporter gene expression in the cell, wherein a level 
25 of reporter gene expression greater than the given level is an indication that the test 

zinc finger domain recognizes the target site. 
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22. The method of claim 21 , further comprising the step of amplifying a 
nucleic acid encoding the test zinc finger domain from genomic DNA, an mRNA 
mixture or a cDNA mixture using an oligonucleotide primer that anneals to a 
sequence encoding a conserved domain boundary. 

5 23. The method of claim 21 , further comprising the steps of (i) identifying a 

candidate zinc finger domain amino acid sequence in a sequence database; (ii) 
providing a candidate nucleic acid encoding the candidate zinc finger domain amino 
acid sequence, and (iii) utilizing the candidate nucleic acid to construct a hybrid 
nucleic acid for inclusion in the plurality of hybrid nucleic acids of step (b). 

10 24. A method of determining whether a test zinc finger domain recognizes a 

target site on a promoter, the method comprising: 

(a) providing a first cell comprising a reporter construct 
comprising a reporter gene operably linked to a promoter, wherein the reporter gene is 
expressed above a given level when a transcription factor recognizes both a 

1 5 recruitment site and a target site of the promoter, but not when the transcription factor 
recognizes only the recruitment site of the promoter; 

(b) providing a second cell comprising a hybrid nucleic acid that 
encodes a protein comprising (i) a transcription activation domain, (ii) a DNA binding 
domain that recognizes the recruitment binding site, and (iii) a test zinc finger 

20 domain; 

(c) fusing the first and second cells to form a fused cell; 

(d) maintaining the fused cell under conditions permitting 
expression of the hybrid nucleic acids in the cell; and 

(e) detecting reporter gene expression in the fused cell, wherein a 
25 level of reporter gene expression greater than the given level is an indication that the 

test zinc finger domain recognizes the target site. 

25. The method of claim 24 wherein the first and second cells are yeast cells 
of the opposite mating types. 
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26. A method of determining whether a test zinc finger domain recognizes a 
target site on a promoter, the method comprising: 



(a) providing a plurality of reporter constructs, each construct 
comprising a reporter gene operably linked to a promoter, wherein the reporter gene is 
expressed above a given level when a transcription factor recognizes both a 
recruitment site and a target site of the promoter, but not when the transcription factor 
recognizes only the recruitment site of the promoter; 

(b) providing a cell containing a hybrid nucleic acid, that encodes a 
non-naturally occurring protein comprising (i) a transcription activation domain, (ii) a 
DNA binding domain that recognizes the recruitment site, and (iii) a test zinc finger 
domain; 

(c) contacting the plurality of reporter constructs with the cell 
under conditions that permit at least one of the plurality of reporter constructs to enter 
the cell; 

(d) maintaining the cell under conditions permitting expression of 
the hybrid nucleic acid in the cell; and 

(e) identifying a cell that contains a reporter gene of (a) and that 
expresses the reporter gene above the given level as an indication that the reporter 
construct in the cell comprises a target site recognized by the test zinc finger domain. 

27. The method of claim 26, wherein the target binding site is between two 
and six nucleotides long. 

28. The method of claim 27, wherein the plurality of reporter constructs 
comprises every possible combination of A, T, G, and C nucleotides at at least two 
positions of the target binding site. 

29. The method of claim 28, wherein the plurality of reporter constructs 
comprises every possible combination of A, T, G, and C nucleotides at at least three 
positions of the target binding sites. 
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30. The method of claim 26, wherein steps (a) to (e) are repeated for a second 
test zinc finger domain to identify a second binding preference. 

31. The method of claim 30, further comprising constructing a nucleic acid 
encoding a polypeptide comprising the first second test zinc finger domains. 

5 32. A method of identifying a plurality of zinc finger domains, the method 

comprising: 

canrying out the method of claim 1 to identify a first test zinc finger 

domain; and 

carrying out the method of claim 1 again to identify a second test zinc 
10 finger domain that recognizes a target site different from the target site recognized by 
the first test zinc finger domain. 

33. A method of generating a nucleic acid encoding a chimeric zinc finger 
protein, the method comprising: 

carrying out the method of claim 32; 

15 constructing a nucleic acid encoding a polypeptide comprising the.first 

and second test zinc finger domains. 

34. A method of identifying DNA sequences recognized by zinc finger 
domains, the method comprising: 

carrying out the method of claim 24 to identify a first target site 
20 recognized by a first test zinc finger domain; and 

carrying out the method of claim 24 again to identify a second target 
site recognized by a second test zinc finger domain. 

35. A method of generating a nucleic acid encoding a chimeric zinc finger 
protein, the method comprising: 

25 carrying out the method of claim 34; 
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constructing a nucleic acid encoding a polypeptide comprising the first 
and second test zinc finger domains. 

36. A purified polypeptide comprising the amino acid sequence: 

X a -X.Cys-X 2 . 5 -Cys-X 3 -X a -X.Cys-X-Ser-Asn-X b -X-Arg-His-X 3 . 5 -His 
5 (SEQIDNO:68), 

wherein is phenylalanine or tyrosine, and is a hydrophobic residue. 

37. A nucleic acid comprising a sequence encoding the polypeptide of claim 

36. 

38. A purified polypeptide comprising the amino acid sequence: 

10 X r X-Cys-X 2 . 5 -Cys-X 3 -X a -X-His-X-Ser-Asn-X b -X-Lys-His-X 3 . 5 -His 
(SEQ ID NO:69), 

wherein X;, is phenylalanine or tyrosine, and X b is a hydrophobic residue. 

39. A nucleic acid comprising a sequence encoding the polypeptide of claim 

38. 

15 40. A purified polypeptide comprising the amino acid sequence: 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Ser-X-Ser-Asn-X b -X-Arg-His-X 3 . 5 -His 
(SEQIDNO:70), 

wherein is phenylalanine or tyrosine, and X b is a hydrophobic residue. 

41. A nucleic acid comprising a sequence encoding the polypeptide of claim 

20 40. 

42. A purified polypeptide comprising the amino acid sequence: 

X^X-Cys-X^-Cys-X.-X^ 
(SEQIDNO:71), 

wherein X,, is phenylalanine or tyrosine, and X b is a hydrophobic residue. 
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43. A nucleic acid comprising a sequence encoding the polypeptide of claim 

42. 

44. A purified polypeptide comprising the amino acid sequence: 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Val-X-Ser-X c -X b -X-Arg-His-X 3 . 5 -His (SEQ ID 
5 NO:72), 

wherein X a is phenylalanine or tyrosine, Xt, is a hydrophobic residue, and X c is 
serine or threonine. 

45. A nucleic acid comprising a sequence encoding the polypeptide of claim 

44. 

10 46. A purified polypeptide comprising the amino acid sequence: 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Gln-X-Ser-His-X b -X-Arg-His-X 3 . 5 -His - 
(SEQ ID NO:73), 

wherein X a is phenylalanine or tyrosine, and X b is a hydrophobic residue. 

47. A nucleic acid comprising a sequence encoding the polypeptide of claim 

15 46. 

48. A purified polypeptide comprising the amino acid sequence: 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Gln-X-Ser-Asn-X b -X-Val-His-X 3 . 5 -His 
(SEQ ID NO:74), 

wherein X,, is phenylalanine or tyrosine, and X b is a hydrophobic residue. 

20 49. A nucleic acid comprising a sequence encoding the polypeptide of claim 

48.50. A purified polypeptide comprising the amino acid sequence: 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Gln-X-Ser-X c -X b -X-Arg-His-X 3 . 5 -His 
(SEQ ID NO:75), 
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wherein X, is phenylalanine or tyrosine, is a hydrophobic residue, and X,. is 
serine or threonine. 

5 1 . A nucleic acid comprising a sequence encoding the polypeptide of claim 

50. 

5 52. A purified polypeptide comprising an amino acid sequence 60% identical 

to SEQIDNO:65. 

53. A nucleic acid comprising a sequence encoding the polypeptide of claim 

52. 

54. A purified polypeptide, comprising an amino acid sequence 60% identical 
10 to an amino acid sequence selected from the group consisting of: SEQ ID NO:29, 127, 

129, 131, 133, and 135. 

55. A nucleic acid, comprising a sequence encoding the polypeptide of claim 

54. 

56. A purified polypeptide comprising the amino acid sequence: 

15 X a -X-Cys-X 2 _ 5 -Cys-X 3 -^ 
(SEQ ID NO: 150), 

wherein X a is phenylalanine or tyrosine, and X b is a hydrophobic residue. 

57. A nucleic acid comprising a sequence encoding the polypeptide of 
claim 56. 

20 58. A purified polypeptide comprising the amino acid sequence: 

X a -X-Cys-X 2 . 5 -Cys-X3-X a -X-Gln-X-Phe-Asn-X b -X-Arg-His-X 3 . 5 -His 
(SEQ ED NO: 151), 

wherein X;, is phenylalanine or tyrosine, and X,, is a hydrophobic residue, 

59. A nucleic acid comprising a sequence encoding the polypeptide of 
25 claim 58. 
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60. A purified polypeptide comprising the amino acid sequence: 

X a -X-Cys-X 2 . s -Cys-X 3 -X a -X-Gln-X-Ser-His-X b -X-Thr-His-X 3 . 5 -His 
(SEQ ID NO: 152), 

wherein X a is phenylalanine or tyrosine, and X b is a hydrophobic residue. 

5 61. A nucleic acid comprising a sequence encoding the polypeptide of 

claim 60. 

62. A purified polypeptide comprising the amino acid sequence: 

X.-X-Cy^X M -C^ 
(SEQ ID NO: 153), 

10 wherein X a is phenylalanine or tyrosine, and Xb is a hydrophobic residue. 

63. A nucleic acid comprising a sequence encoding the polypeptide of 
claim 62. 

64. A purified polypeptide comprising the amino acid sequence: 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-GIn-X-Ser-Asn-X b -X-Ile-His-X3. 5 -His 
15 (SEQ ID NO: 154), 

wherein X a is phenylalanine or tyrosine, and X b is a hydrophobic residue. 

65. A nucleic acid comprising a sequence encoding the polypeptide of 
claim 64. 

66. A purified polypeptide comprising the amino acid sequence: 

20 X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Gln-X-Ser-Asn-X b -X-Arg-His-X 3 . 5 -His 
(SEQ ID NO: 155), 

wherein X a is phenylalanine or tyrosine, and X b is a hydrophobic residue. 

67. A nucleic acid comprising a sequence encoding the polypeptide of 
claim 66. 
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68. A purified polypeptide comprising the amino acid sequence: 

X a -X-Cys-X 2 _ 5 -Cys-X r X^ 
(SEQIDNO:156), 

wherein X, is phenylalanine or tyrosine, and Xt, is a hydrophobic residue. 

5 69. A nucleic acid comprising a sequence encoding the polypeptide of 

claim 68. 

70. A purified polypeptide comprising the amino acid sequence: 

Cys-X 2 . 5 -Cys-X 3 -X a -X-Glri-X-Thr-His-X b -X-Arg-His-X3. 5 -His 
(SEQIDNO:157), 

10 wherein X,, is phenylalanine or tyrosine and X b is a hydrophobic residue. 

71 . A nucleic acid comprising a sequence encoding the polypeptide of 
claim 70. 

72. A purified polypeptide comprising the amino acid sequence: 

X a -X-Cys-X 2 . 5 -Cys-X 3 -X a -X-Arg-X-Asp-Lys-X b -X-Ile-His-X 3 . 5 -His 
15 (SEQIDNO:I58), 

wherein X a is phenylalanine or tyrosine, and X b is a hydrophobic residue. 

73. A nucleic acid comprising a sequence encoding the polypeptide of 
claim 72. 

74. A purified polypeptide comprising the amino acid sequence: 

20 X^X-Cys-X^-tys^-X^ 
(SEQIDNO:159), 

wherein X, is phenylalanine or tyrosine, and X,, is a hydrophobic residue. 

75. A nucleic acid comprising a sequence encoding the polypeptide of 
claim 74. 

-81- 
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76. A purified polypeptide comprising an amino acid sequence 60% 
identical to SEQ ID NO: 107. 

77. A nucleic acid comprising a sequence encoding the polypeptide of 
claim 76. 

5 78. A purified polypeptide comprising an amino acid sequence 60% 

identical to SEQ ID NO: 137. 

79. A nucleic acid comprising a sequence encoding the polypeptide of 
claim 78. 

80. A purified polypeptide comprising an amino acid sequence 60% 
10 identical to SEQ ID NO: 145. 

81 . A nucleic acid comprising a sequence encoding the polypeptide of 
claim 80. 

82. A purified polypeptide comprising an amino acid sequence 60% 
identical to SEQ ID NO: 149. 

15 83. A nucleic acid comprising a sequence encoding the polypeptide of 

claim 82. 

84. A purified polypeptide comprising an amino acid sequence 60% 
identical to SEQ ID NO:141. 

85. A nucleic acid comprising a sequence encoding the polypeptide of 
20 claim 84. 
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Figure 3 
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Figure 6 



HIV-1 LTR (-124/- 115): 5'-GAC ATC GAGC-3' (SEQlDNO:l) 

HIV-1 LTR (-23/- 14): 5'-GCA GCT GCTT -3' (SEQ ID NO:2) 

fflV-1 LTR (-95/-86): 5'-GCT GGG GACT-3' (SEQ1DN0:3) 

Human CCR5 (-70/-79): 5*-AGG GTG GAG T -3' (SEQ ID NO.-4) 

Human CCR5 (+7/+ 1 6): 5'-GCT GAG ACAT-3' (SEQ ID NO.5) 
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Figure 7 

GCGT (optimal Zif 268 -binding site) 

5»-CCG GCGTGGGCG GCT GCGTGGGCG T GCGTGGGCG GACT GCGTGGGCG - 3 » (SEQ ID NO:6) 

3 1 -GCACCCGC CGA CGCACCCGC A CGCACCCGC CTGA OGCACCCGC AGCT-5 ' (SEQIDNO:7) 

GAGC (HIV-1 LTR, -118/-115) 

S'-CCGGC GAGCGGGCG GTC GAGCGGGCG T GAGCGGGCG GATC GAGCGGGCG- 3 ' (SEQIDNO:8) 
3'-G CTCGCCCGC CAG CTCGCCCGC A CTCGCCCGC CTAG CTCGCCCGC AGCT-5 1 (SEQ ID NO:9) 



GCTT (HIV-1 LTR, -17/-14) 

S'-CCGGCT GCTTGGGCG GCT GCTTGGGCG T GCTTGGGCG GGCT GCTTGGGCG - 3 1 (SEQ ID NO: 10) 

3'-GA CGAACCCGC CGA CGAACCCGC A CGAACCCGC CCGA CGAACCCGC AGCT-5 1 (SEQ ID NO: 1 1) 



GACT (HIV-1 LTR, -89/-8S) 

S'-CCG GACTGGGCG GGG GACTGGGCG T GACTGGGCG GAGG GACTGGGCG-3 ' (SEQ ID NO: 12) 

3'-TGACCCGC CCC CTGACCCGC A CTGACCCGC CTCC CTGACCCGC AGCT-5 1 (SEQIDNO:13) 



GAGT (Human CCR5, -7S/-79) 

5'-CCG GAGTGGGCG GTG GAGTGGGCG T GAGTGGGCG GATG GAGTGGGCG-3 ' (SEQ ID NO: 14) 

3'-TCACCCGC CAC CTCACCCGC A CTCACCCGC CTAC CTCACCCGC AGCT-5 1 (SEQIDN0:15) 



ACAT (Human CCR5 , +13/+16) 

.5'-CCGG ACATGGGCG GAG ACATGGGCG T ACATGGGCG GAAG ACATGGGCG-3 1 (SEQ ID NO: 16) 

3*-TGTACCCGC CTC TGTACCCGC A TGTACCCGC CTTC TGTACCCGC AGCT-5 ' (SEQ ID NO: 17) 
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Figure 8 
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Figure 9 

E RPYACPVESC 
GGGTCG ACC T TC CGG ACT GGC CA G GAA CGC CCA TAT GCT TGC CCT GTC GAG TCC TGC 

Sail BspEl Mscl 

DRRFSRS DELTRHI RIHTG 
GAT CGC CGC TTT TCT CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC A CT GGC 

Msc\ 

QK PFQCR ICMRNFSRSDHL 
CAG AAG CCC TTC CAG TGT CGA ATC TGC ATG CGT AAC TTC AGT CGT AGT GAC CAC CTT 

TTHI RTH TG E KPFACD ICG 
ACC ACC CAC ATC CGG A CC C AC ACC GGC GA G AAG CCT TTT GCC TGT GAC ATT TGT GGG 
BspEl SgrM 

RKFARSD E RKRHTK IHLRQ 
AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG AGG CAT ACC AAA ATC CAT TTA AGA CAG 

K D (SEQIDNO:21) 
AAG GAT CCG CGG GAA TCC (SEQIDNO:20) 
SacU EcoKI 
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Figure 11 



CSNR: 

T G QK PYKCKQCGKAFGCPS 
ACC GGG CAG AAA CCG TAC AAA TGT AAG CAA TGT GGG AAA GCT TTT GGA TGT CCC TCA 

-1 2 

NLR R # G R T//TGEK PR (SEQIDNO:23) 
AAC CTT CGA AGG CAT GGA AGG ACT CAC ACC GGC GAG AAA CCG CGG (SEQ ID N0 22) 



HSNK: 

TG E KPY KCKEC GKAFNHSS 
ACC GGG GAG AAG CCA TAC AAG TGT AAG GAG TGT GGG AAA GCC TTC AAC CAC AGC TCC 

-I 2 

NFNKtfHRItfTGEKPR (SEQIDNO:25) 
AAC TTC AAT AAA CAC CAC AGA ATC CAC ACC GGC GAA AAG CCG CGG (SEQ ID N0 24) 



SSNR: 

TGE R.PFECKECG KAFSSGS 
A CC GGG GAG AGG CCA TTT GAA TGT AAG GAA TGT GGG AAA GCC TTT AGT AGT GGT TCA 

-1 2 

NFTR//QR ItfTGEKPR (SEQIDNO:27) 
AAC TTC ACT CGA CAT CAG AGA ATT CAC ACC GGT GAA AAG CCG CGG (SEQ ID NO:26) 



RDER: 

TGQK P YVCDVE GCTWKFAR 
ACC GGG CAG AAG CCA TAC GTA TGC GAT GTA GAG GGA TGT ACG TGG AAA TTT GCC CGC 

-1 

SDELN R//KKR/fTGE R PR (SEQIDNO:29) 
TCA GAT GAG CTC AAC AGA CAC AAG AAA AGG CAC ACC GGC GAA AGA CCG CGG (SEQ ID NO:28) 
2 3 6 



QSTV; 

TGERPYECNECGKAFAQNS 
ACC GGG GAG AGA CCT TAC GAG TGT AAT GAA TGC GGG AAA GCT TTT GCC CAA AAT TCA 

-I 2 

TLRVi/QR I//TGE KPR (SEQIDNO:31) 
ACT CTC AGA GTA CAC CAG AGA ATT CAC ACC GGC GAA AAG CCG CGG (SEQ ID NO:30) 



VSTR: 

TGE RPYECNYCG KTFSV SS 
ACC GGG GAG AGG CCT TAT GAG TGT AAT TAC TGT GGA AAA ACC TTT AGT GTG AGC TCA 

-1 2 

TLI R#QR I//TG E RPR (SEQIDNO:33) 
ACC CTT ATT AGA CAT CAG AGA ATC CAC ACC GGC GAG AGA CCG CGG (SEQ ID NO:32) 
3 6 
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SEQUENCE LISTING 

<110> Kim, Jin-Soo 
Kwon, Young-Do 
Kim, Hyun-Won 
Ryu, Eun Hyun 
Hwang, Moon Sun 

<120> SELECTION OF TARGET-SPECIFIC ZINC FINGER 
DOMAINS 

<130> 12279-002001 

<160> 167 

<170> FastSEQ for Windows Version 4.0 



<210> 


1 


<211> 


10 


<212> 


DNA 


<213> 


HIV-1 



<400> 1 

gacatcgagc 

10 

<210> 2 
<211> 10 
<212> DNA 
<213> HIV-1 

<400> 2 

gcagctgctt 

10 
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2 

gctgagacat 10 

<210> 6 
<211> 47 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> optimal binding site 
<400> 6 

ccggcgtggg cggctgcgtg ggcgtgcgtg ggcggactgc gtgggcg 
47 

<210> 7 
<211> 47 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> optimal binding site 
<400> 7 

tcgacgccca cgcagtccgc ccacgcacgc ccacgcagcc gcccacq 
47 

<210> 8 
<211> 49 
<212> DNA 
<213> HIV-1 

<400> 8 

ccggcgagcg ggcggtcgag cgggcgtgag cgggcggatc gagcgggcg 
49 

<210> 9 
<211> 49 
<212> DNA 
<213> HIV-1 

<400> 9 

tcgacgcccg ctcgatccgc ccgctcacgc ccgctcgacc gcccgctcg 
49 

<210> 10 
<211> 50 
<212> DNA 
<213> HIV-1 

<400> 10 

ccggctgctt gggcggctgc ttgggcgtgc ttgggcgggc tgcttgggcg 
50 

<210> 11 
<211> 50 
<212> DNA 
<213> HIV-1 



<400> 11 
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3 

tcgacgccca agcagcccgc ccaagcacgc ccaagcagcc gcccaagcag 
50 

<210> 12 
<211> 47 
<212> DNA 
<213> HIV-1 

<400> 12 

ccggactggg cgggggactg ggcgtgactg ggcggaggga ctgggcg 
47 

<210> 13 
<211> 47 
<212> DNA 
<213> HIV-1 

<400> 13 

tcgacgccca gtccctccgc ccagtcacgc ccagtccccc gcccagt 
47 

<210> 14 
<211> 47 
<212> DNA 

<213> Homo sapiens 
<400> 14 

ccggagtggg cggtggagtg ggcgtgagtg ggcggatgga gtgggcg 
47 

<210> 15 
<211> 47 
<212> DNA 

<213> Homo sapiens 
<400> 15 

tcgacgccca ctccatccgc ccactcacgc ccactccacc gcccact 
47 

<210> 16 
<211> 48 
<212> DNA 

<213> Homo sapiens 
<400> 16 

ccggacatgg gcggagacat gggcgtacat gggcggaaga catgggcg 
48 

<210> 17 
<211> 48 
<212> DNA 

<213> Homo sapiens 
<400> 17 

tcgacgccca tgtcttccgc ccatgtacgc ccatgtctcc gcccatgt 
48 

<210> 18 
<211> 120 
<212> DNA 
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4 

<213> Artificial Sequence 
<220> 

<223> plasmid sequence 

<221> CDS 

<222> CD . . . (81) 
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<400> 18 

aaa gag ggt ggg teg acc ttc egg act ggc cag gaa cgc cca gat ccg 

Lys Glu Gly Gly Ser Thr Phe Arg Thr Gly Gin Glu Arg Pro Asp Pro 
15 10 15 

egg gaa ttc aga tct act agt gcg gec get aag taagtaagac gtcgagctcg 

Arg Glu Phe Arg Ser Thr Ser Ala Ala Ala Lys 
20 25 

ccatcgcggt ggaagcttt 
120 

<210> 19 
<211> 27 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> plasmid sequence 
<400> 19 

Lys Glu Gly Gly Ser Thr Phe Arg Thr Gly Gin Glu Arg Pro Asp Pro 

1 5 ■ 10 15 

Arg Glu Phe Arg Ser Thr Ser Ala Ala Ala Lys 
20 25 

<210> 20 
<211> 303 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> plasmid sequence 

<221> CDS 

<222> (25) . . . (291) 

<400> 20 

gggtcgacct teeggactgg ccag gaa cgc cca tat get tgc cct gtc gag 
51 

Glu Arg Pro Tyr Ala Cys Pro Val Glu 
1 5 

tec tgc gat cgc cgc ttt tct cgc teg gat gag ctt acc cgc cat ate 
99 

Ser Cys Asp Arg Arg Phe Ser Arg Ser Asp Glu Leu Thr Arg His lie 

10 15 20 25 



cgc ate cac act ggc cag aag ccc ttc cag tgt cga ate tgc atg cgt 
147 
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Arg He His Thr Gly Gin Lys Pro 
30 

aac ttc agt cgt agt gac cac ctt 
195 

Asn Phe Ser Arg Ser Asp His Leu 
45 



5 

Phe Gin Cys Arg He Cys Met Arg 
35 40 

acc acc cac ate egg acc cac acc 

Thr Thr His He Arg Thr His Thr " 
50 , 55 



ggc gag aag cct 
243 

Gly Glu Lys Pro 
60 

agt gat gaa cgc 
291 

Ser Asp Glu Arg 
75 



ttt gec tgt gac 

Phe Ala Cys Asp 
65 

aag agg cat acc 

Lys Arg His Thr 
80 



att tgt ggg agg 
He Cys Gly Arg 

aaa ate cat tta 

Lys He His Leu 
85 



aag ttt gec agg 

Lys Phe Ala Arg 
70 

aga cag aag gat 
Arg Gin Lys Asp 



ccgcgggaat cc 
303 



<210> 21 
<211> 89 
<212> PRT 

<213> Artificial Sequence 



<220> 

<223> plasmid sequence 



<400> 21 












Glu 


Arg 


Pro 


Tyr 


Ala 


Cys 


Pro 


Val 


1 








5 








Arg 


Ser 


Asp 


Glu 


Leu 


Thr Arg 


His 








20 










Pro 


Phe 


Gin 


Cys 


Arg 


He 


Cys 


Met 






35 










40 


Leu 


Thr 


Thr 


His 


He 


Arg 


Thr 


His 




50 










55 




Asp 


He 


Cys 


Gly Arg 


Lys 


Phe 


Ala 


65 










70 






Thr 


Lys 


He 


His 


Leu 


Arg 


Gin 


Lys 



85 



Glu Ser Cys Asp Arg Arg Phe Ser 

10 15 
He Arg He His Thr Gly Gin Lys 
25 30 
Arg Asn Phe Ser Arg Ser Asp His 
- 45 

Thr Gly Glu Lys Pro Phe Ala Cys 
60 

Arg Ser Asp Glu Arg Lys Arg His 

75 * - . 80 

Asp 



<210> 22 
<211> 102 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (102) 

<400> 22 

acc ggg cag aaa ccg tac aaa tgt aag caa tgt ggg aaa get ttt gga 
48 

Thr Gly Gin Lys Pro Tyr Lys Cys Lys Gin Cys Gly Lys Ala Phe Gly 
15 10 15 



tgt ccc tea aac ctt cga agg, cat gga agg act cac acc ggc gag aaa 
96 
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Cys Pro Ser Asn Leu Arg Arg His Gly Arg Thr His Thr Gly Glu Lys 
20 25 30 

ccg egg 
102 

Pro Arg 



<210> 23 
<211> 34 
<212> PRT 

<213> Homo sapiens 
<400> 23 

Thr Gly Gin Lys Pro Tyr Lys Cys 

1 5 
Cys Pro Ser Asn Leu Arg Arg His 
20 

Pro Arg 



Lys Gin Cys Gly Lys Ala Phe Gly 

10 15 
Gly Arg Thr His Thr Gly Glu Lys 
25 30 



<210> 24 
<211> 102 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (102) 

<400> 24 

acc ggg gag aag cca tac aag tgt aag gag tgt ggg aaa gec ttc aac 
48 

Thr Giy Glu Lys Pro Tyr Lys Cys Lys Glu Cys Gly Lys Ala Phe Asn 
1 - 5 10 15 - 

cac age tec aac ttc aat aaa cac cac aga ate cac acc ggc gaa aag 
96 

His Ser Ser Asn Phe Asn Lys His His Arg lie His Thr" Gly Glu Lys 
20 25 30 

ccg egg 
102 

Pro Arg 



<210> 25 
<211> 34 
<212> PRT 

<213> Homo sapiens 
<400> 25 

Thr Gly Glu Lys Pro Tyr Lys Cys Lys Glu Cys Gly Lys Ala Phe Asn 

1 5 10 15 

His Ser Ser Asn Phe Asn Lys His His Arg lie His Thr Gly Glu Lys 
20 25 30 

Pro Arg 
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<210> 26 
<211> 102 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (102) 

<400> 26 

acc ggg gag agg cca ttt gaa tgt aag gaa tgt ggg aaa gcc ttt agt 
48 

Thr Gly Glu Arg Pro Phe Glu Cys Lys Glu Cys Gly Lys Ala Phe Ser 
15 10 15 

agt ggt tea aac ttc act cga cat cag aga att cac acc ggt gaa aag 
96 

Ser Gly Ser Asn Phe Thr Arg His Gin Arg He His Thr Gly Glu Lys 
20 25 30 

ccg egg 
102 

Pro Arg 



<210> 27 
<211> 34 
<212> PRT 

<213> Homo sapiens 
<400> 27 

Thr Gly Glu Arg Pro Phe Glu Cys Lys Glu Cys Gly Lys Ala Phe Ser 

1 5 10 15. 

Ser Gly Ser Asn Phe Thr Arg His Gin Arg He His Thr Gly Glu Lys 
20 25 30 

Pro Arg 



<210> 28 

<211> 108 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (108) 

<400> 28 

acc ggg cag aag cca tac gta tgc gat gta gag gga tgt acg tgg aaa 
48 

Thr Gly Gin Lys Pro Tyr Val Cys Asp Val Glu Gly Cys Thr Trp Lys 
15 10 15 

ttt gcc cgc tea gat gag etc aac aga cac aag aaa agg cac acc ggc 
96 

Phe Ala Arg Ser Asp Glu Leu Asn Arg His Lys Lys Arg His Thr Gly 

20 25 30 
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gaa aga ccg egg 
Glu Arg Pro Arg 
35 



108 



<210> 29 
<211> 36 
<212> PRT 

<213> Homo sapiens 
<400> 29 

Thr Gly Gin Lys Pro Tyr Val Cys Asp Val Glu Gly Cys Thr Trp Lys 

15 io 15 

Phe Ala Arg Ser Asp Glu Leu Asn Arg His Lys Lys Arg His Thr Gly 

20 25 30 

Glu Arg Pro Arg 
35 

<210> 30 
<211> 102 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (102) 

<400> 30 

acc ggg gag aga cct tac gag tgt aat gaa tgc ggg aaa get ttt gec 
48 

Thr Gly Glu Arg Pro Tyr Glu Cys Asn Glu Cys Gly Lys Ala Phe Ala 
15 10 15 

caa aat tea act etc aga gta cac cag aga att cac acc ggc gaa aag 

Gin Asn Ser Thr Leu Arg Val His Gin Arg He His Thr Gly Glu Lys 
20 25 30 

ccg egg 
102 

Pro Arg 



<210> 31 
<211> 34 
<212> PRT 

<213> Homo sapiens 
<400> 31 

Thr Gly Glu Arg Pro Tyr Glu Cys 

1 5 
Gin Asn Ser Thr Leu Arg Val His 
20 

Pro Arg 



Asn Glu Cys Gly Lys Ala Phe Ala 

10 15 
Gin Arg He His Thr Gly Glu Lys 
25 30 



<210> 32 
<211> 102 
<212> DNA 
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<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (102) 

<400> 32 

acc ggg gag agg cct tat gag tgt aat tac tgt gga aaa acc ttt agt 
48 

Thr Gly Glu Arg Pro Tyr Glu Cys Asn Tyr Cys Gly Lys Thr Phe Ser 
15 10 15 

gtg age tea acc ctt att aga cat cag aga ate cac acc ggc gag aga 
96 

Val Ser Ser Thr Leu lie Arg His Gin Arg lie His Thr Gly Glu Arg 
20 25 30 

ccg egg 
102 

Pro Arg 



<210> 33 
<211> 34 
<212> PRT 

<213> Homo sapiens 



<400> 33 

Thr Gly Glu Arg Pro Tyr Glu Cys Asn Tyr Cys Gly' Lys Thr Phe Ser 

15 10 15 

Val Ser Ser Thr Leu lie Arg His Gin Arg He His Thr Gly Glu Arg 
20 25 30 

Pro Arg 



<210> 34 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 34 

tat cag tgc aac att tgc gga aaa tgt ttc tec tgc aac tec aac etc 
48 

Tyr Gin Cys Asn He Cys Gly Lys Cys Phe Ser Cys Asn Ser Asn Leu 
15 10 15 

cac agg cac cag aga acg cac 
69- 

His Arg His Gin Arg Thr His 
20 



<210> 35 
<211> 23 
<212> PRT 
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<213> Homo sapiens 



10 
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<400> 35 

Tyr Gin Cys Asn lie Cys Gly Lys Cys Phe Ser Cys Asn Ser Asn Leu 

1 5 10 ; 15 

His Arg His Gin Arg Thr His 
20 

<210> 36 

<211> 69 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 36 

tat gca tgt cat eta tgt gga aaa gec ttc act cag agt tct cac ctt 
48 

Tyr Ala Cys His Leu Cys Gly Lys Ala Phe Thr Gin Ser Ser His Leu 
15 10 15 

aga aga cat gag aaa act cac 
69 

Arg Arg His Glu Lys Thr His 
20 



<210> 37 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 37 

Tyr Ala Cys His Leu Cys Gly Lys Ala Phe Thr Gl-n Ser Ser His Leu 

15 10 15 

Arg Arg His Glu Lys Thr His 
20 

<210> 38 

<211> 69 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . , . (69) 

<400> 38 

tat aaa tgc ggc cag tgt ggg aag ttc tac teg cag gtc tec cac etc 
48 

Tyr Lys Cys Gly Gin Cys Gly Lys Phe Tyr Ser Gin Val Ser His Leu 
15 10 15 

acc cgc cac cag aaa ate cac 
69 

Thr Arg His Gin Lys lie His 
20 
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<210> 39 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 39 

Tyr Lys Cys Gly Gin Cys Gly Lys Phe Tyr Ser Gin Val Ser His Leu 

15 10 15 

Thr Arg His Gin Lys lie His 
20 

<210> 40 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 40 

tat gca tgt cat eta tgt gga aaa gec ttc act cag tgt tct cac ctt 
48 

Tyr Ala Cys His Leu Cys Gly Lys Ala Phe Thr Gin Cys Ser His Leu 
15 10 15 

aga aga cat gag aaa act cac 
69 

Arg Arg His Glu Lys Thr His 
20 



<210> 41 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 41 

Tyr Ala Cys His Leu Cys Gly Lys Ala Phe Thr Gin Cys Ser His Leu 

15 10 15 

Arg Arg His Glu Lys Thr His 
20 

<210> 42 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 42 

tat gca tgt cat eta tgt gca aaa gec ttc att cag tgt tct cac ctt 
48 

Tyr Ala Cys His Leu Cys Ala Lys Ala Phe He Gin Cys Ser His Leu 
1 5 10 15 
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aga aga cat gag aaa act cac 69 
Arg Arg His Glu Lys Thr His 
20 



<210> 43 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 43 

Tyr Ala Cys His Leu Cys Ala Lys Ala Phe He Gin Cys Ser His Leu 

15 10 15 

Arg Arg His Glu Lys Thr His 
20 

<210> 44 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 44 

tat gtt tgc agg gaa tgt ggg cgt ggc ttt cgc cag cat tea cac ctg 
48 

Tyr Val Cys Arg Glu Cys Gly Arg Gly Phe Arg Gin His Ser His Leu 
15 10 15 

gtc aga cac aag agg aca cat 
69 

Val Arg His Lys Arg Thr His 
20 



<210> 45 ' 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 45 

Tyr Val Cys Arg Glu Cys Gly Arg Gly Phe Arg Gin His Ser His Leu 

15 10 15 

Val Arg His Lys Arg Thr His 
20 

<210> 46 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 46 

ttt gag tgt aaa gat tgc ggg aaa get ttc att cag aag tea aac etc 
48 



1 
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Phe Glu Cys Lys Asp Cys Gly Lys Ala Phe He Gin Lys Ser Asn Leu 
15 10 15 

ate aga cac cag aga act cac 
69 

He Arg His Gin Arg Thr His 
20 



<210> 47 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 47 

Phe Glu Cys Lys Asp Cys Gly Lys Ala Phe He Gin Lys Ser Asn Leu 

1 5 * 10 15 

He Arg His Gin Arg Thr His 
20 

<210> 48 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 48 

tat gtc tgc agg gag tgt agg cga ggt ttt age cag aag tea aat etc 
48 

Tyr Val Cys Arg Glu Cys Arg Arg Gly Phe Ser Gin Lys Ser Asn Leu 
15 10 15 

ate aga cac cag agg acg cac 
69 

He Arg His Gin Arg Thr His 
20 



<210> 49 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 49 

Tyr Val Cys Arg Glu Cys Arg Arg Gly Phe Ser Gin Lys Ser Asn Leu 

15 10 15 

He Arg His Gin Arg Thr His 
20 



<210> 50 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 
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<400> 50 

tat gaa tgt aac aca tgc agg aaa acc ttc tct caa aag tea aat etc 
48 

Tyr Glu Cys Asn Thr Cys Arg Lys Thr Phe Ser Gin Lys Sex Asn Leu 
15 10 15 

* 

att gta cat cag aga aca cac 
69 

lie Val His Gin Arg Thr His 
20 



<210> 51 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 51 

Tyr Glu Cys Asn Thr Cys Arg Lys Thr Phe Ser Gin Lys Ser Asn Leu 

15 10 15 

lie Val His Gin Arg Thr His 
20 



<210> 52 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 



<400> 52 

tat gtt tgc tea aaa tgt ggg aaa gec ttc act cag agt tea aat ctg 
48 ' 

Tyr Val Cys Ser Lys Cys Gly Lys Ala Phe Thr Gin Ser Ser Asn Leu 
1 5 10 15 



act gta cat caa aaa ate cac 
69 

Thr Val His Gin Lys lie His 
20 



<210> 53 
<211> 23 
<212> PRT 

<213> Homo sapiens 



<400> 53 

Tyr Val Cys Ser Lys Cys Gly Lys Ala Phe Thr Gin Ser Ser Asn Leu 

15 10 15 

Thr Val His Gin Lys lie His 
20 



<210> 54 
<211> 69 
<212> DNA 

<213> Homo sapiens 
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<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 54 

tac aaa tgt gac gaa tgt gga aaa aac ttt acc cag tec tec aac ctt 
48 

Tyr Lys Cys Asp Glu Cys Gly Lys Asn Phe Thr Gin Ser Ser Asn Leu 
15 10 15 

att gta cat aag aga att cat 
69 

He Val His Lys Arg He His 
20 



<210> 55 

<211> 23 

<212> PRT 

<213> Homo sapiens 

<400> 55 

Tyr Lys Cys Asp Glu Cys Gly Lys Asn Phe Thr Gin Ser Ser Asn Leu 

1 5 10 15 

He Val His Lys Arg He His 
20 

<210> 56 

<211> 69 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (6*9) 

<400> 56 

tat gaa tgt gat gtg tgt gga aaa acc ttc acg caa aag tea aac ctt 
48 

Tyr Glu Cys Asp Val Cys Gly Lys Thr Phe Thr Gin Lys Ser Asn Leu 
1 '5 10 15 

ggt gta cat cag aga act cat 
69 

Gly Val His Gin Arg Thr His 
20 



<210> 57 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 57 

Tyr Glu Cys Asp Val Cys Gly Lys 

1 5 
Gly Val His Gin Arg Thr His 
20 



Thr Phe Thr Gin Lys Ser Asn Leu 
10 15 
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<210> 58 

<211> 69 

<212> DNA 

<213> Homo sapiens 

-<220> 
<221> CDS 
<222> (1) . . . (69) 

<400> 58 

tat aag tgc cct gat tgt ggg aag agt ttt agt cag agt tec age etc 
48 

Tyr Lys Cys Pro Asp Cys Gly Lys Ser Phe Ser Gin Ser Ser Ser Leu 
15 10 15 

att cgc cac cag egg aca cac 
69 

lie Arg His Gin Arg Thr His 
20 



<210> 59 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 59 

Tyr Lys Cys Pro Asp Cys Gly Lys Ser Phe Ser Gin Ser Ser Ser Leu 

15 10 15 

lie Arg His Gin Arg Thr His 
20 

<210> 60 

<211> 69 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 60 

tat gag tgt cag gac tgt ggg agg gee ttc aac cag aac tec tec ctg 
48 

Tyr Glu Cys Gin Asp Cys Gly Arg Ala Phe Asn Gin Asn Ser Ser Leu 
15 10 15 

ggg egg cac aag agg aca cac 
69 

Gly Arg His Lys Arg Thr His 
20 



<210> 61 

<211> 23 

<212> PRT 

<213> Homo sapiens 

<400> 61 

Tyr Glu Cys Gin Asp Cys Gly Arg Ala Phe Asn Gin Asn Ser Ser Leu 
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1 5 
Gly Arg His Lys Arg Thr His 
20 



10 



15 



<210> 62 

<211> 69 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1)...(69) 

<400> 62 

tac aaa tgt gaa gaa tgt ggc aaa get ttt aac cag tec tea ace ctt 
48 

Tyr Lys Cys Glu Glu Cys Gly Lys Ala Phe Asn Gin Ser Ser Thr Leu 
15 10 15 

act aga cat aag ata gtt cat 
69 

Thr Arg His Lys He Val His 
20 



<210> 63 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 63 

Tyr Lys Cys Glu Glu Cys Gly Lys Ala Phe Asn Gin Ser Ser Thr Leu 

15 10 15 

Thr Arg His Lys He Val His 
20 

<210> 64 

<211> 69 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 64 

tat aag tgc atg gag tgt ggg aag get ttt aac cgc agg tea cac etc 
48 

Tyr Lys Cys Met Glu Cys Gly Lys Ala Phe Asn Arg Arg Ser His Leu 
15 10 15 

aca egg cac cag egg att cac 
69 

Thr Arg His Gin Arg He His 
20 



<210> 65 
<211> 23 
<212> PRT 



WO 01/60970 



PCTYKR01/00244 



18 

<213> Homo sapiens 
<400> 65 

Tyr Lys Cys Met Glu Cys Gly Lys Ala Phe Asn Arg Arg Ser His Leu 

1 5 10 -is 

Thr Arg His Gin Arg lie His 
20 

<210> 66 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 66 

tat aca tgt aaa cag tgt ggg aaa gcc ttc agt gtt tec agt tec ctt 
48 

Tyr Thr Cys Lys Gin Cys Gly Lys Ala Phe Ser Val Ser Ser Ser Leu 
15 10 15 

cga aga cat gaa acc act cac 
69 

Arg Arg His Glu Thr Thr His 
20 



<210> 61 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 67 

Tyr Thr Cys Lys Gin Cys Gly Lys Ala Phe Ser Val Ser Ser Ser Leu 

15 10 15 

Arg Arg His Glu Thr Thr His 
20 

<210> 68 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa - any amino acid 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 
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<221> VARIANT 
<222> 15 

<223> Xaa* = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 68 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Cys Xaa Ser Asn Xaa Xaa 

15 10 15 

Arg His Xaa His 
20 

<210> 69 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 
<221> VARIANT 

<222> 15 - " 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 69 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa His Xaa Ser Asn Xaa Xaa 

15 10 15 

Lys His Xaa His 
20 

<210> 70 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 
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<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa - any amino acid 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 70 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Ser Xaa Ser Asn Xaa Xaa 

15 10 15 

Arg His Xaa His 
20 

<210> 71 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 4 

<223> Xaa - any amino acid; 2-5 amino acids in length 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 1,9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 
<222> 15 

<223> Xaa - hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 71 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Gin Xaa Ser Thr Xaa Xaa 

15 10 15 

Val His Xaa His 
20 

<210> 72 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
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<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 



<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 
<222> 14 

<223> Xaa =» Ser or Thr 



<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> (19) . . . (19) 

<223> Xaa = any amino acid; 3-5 amino acids in length 



<400> 72 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa 

1 5 
Arg His Xaa His 
20 

<210> 73 
<211> 20 
<212> PRT 

<213> Artificial Sequence 



Xaa Xaa'Val Xaa Ser Xaa Xaa Xaa 
10 15 



<220> 

<223> purified polypeptide 



<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 



<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 



<221> 
<222> 



VARIANT 
19 
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<223> Xaa = 



any amino acid; 3-5 



amino acids in length 



<400> 73 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Gin Xaa Ser Hi-s Xaa Xaa 



Arg His Xaa His 
20 

<210> 74 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 
<222> 15 

<223> Xaa - hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 74 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Gin Xaa Ser Asn Xaa Xaa 

15 10 15 

Val His Xaa His 
20 

<210> 75 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 4 

<223> Xaa - any amino acid; 2-5 amino acids in length 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 



1 



5 



10 



15 



<221> VARIANT 
<222> 1, 9 
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<223> 



Xaa = Phe or Tyr 



<221> 
<222> 
<223> 



VARIANT 
14 

Xaa = Ser or Thr 



<221> 



VARIANT 



<222> 15 

<22 3> Xaa = hydrophobic residue 

<221> VARIANT 
<222> (19) . . . (19) 

<22 3> Xaa = any amino acid; 3-5 amino acids in length 
<400> 75 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Gin Xaa Ser Xaa Xaa Xaa 

15 10 15 

Arg His Xaa His 
20 

<210> 76 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> coordinating residue 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 
<221> VARIANT 

<222> 2, 6-8, 10-14, 16, 17 
<223> Xaa = any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 76 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 



Xaa His Xaa His 
20 

<210> 77 
<211> 24 
<212> PRT 

<213> Artificial Sequence 



1 



5 



10 



15 



<220> 
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<223> polypeptide motif 

<221> VARIANT 
<222> 1 

<223> Xaa = Leu, He, Val, Met, Phe, Tyr, or Gly 

<221> VARIANT 
<222> 2 

<223> Xaa = Ala, Ser, Leu, Val, or Arg 
<221> VARIANT 

<222> 3-4, 6, 8-11, 17, 19-23 
<223> Xaa = any amino acid 

<221> VARIANT 
<222> 5 

<223> Xaa = Leu, He, Val, Met, Ser, Thr, Ala, Cys, or 
Asn 



<221> VARIANT 
<222> 7 

<223> Xaa = Leu, He, Val, or Met 

<221> VARIANT 

<222> (12) . . . (12) 

<223> Xaa = Leu, He, or Val 



<221> VARIANT 
<222> (13) . . . (13) 

<223> Xaa = Arg, Lys, Asn, Gin, Glu, Ser, Thr, Ala, He, 
or Tyr 

<2 21> VARIANT 
<222> (14) . . . (14) 

<223> Xaa = Leu, He, Val, Phe, Ser, Thr, .Asn, Lys, or 
His 



<221> VARIANT 
<222> (16) . . . (16) 

<223> Xaa = Phe, Tyr, Val, or Cys 

<221> VARIANT 
<222> (18) . . . (18) 

<223> Xaa = Asn, Asp, Gin, Thr, Ala, or His 

<221> VARIANT 
<222> (24) . . . (24) 

<223> Xaa = Arg, Lys, Asn, Ala, He, Met, or Trp 



<400> 77 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

1 5 
Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
20 



Xaa Xaa Xaa Xaa Xaa Xaa Trp Xaa 
10 15 



<210> 78 
<211> 6 
<212> PRT 
<213> Eukaryote 
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<220> 

<221> VARIANT 
<222> 3 

<223> Xaa = Glu or Gin 

<221> VARIANT 
<222> 4 

<223> Xaa = Lys or Arg 

<221> VARIANT 
<222> 6 

<223> Xaa = Tyr or Phe 
<400> 78 

Thr Gly Xaa Xaa Pro Xaa 
1 5 

<210> 79 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic oligonucleotide 
<400> 79 

tgcctgcagc atntgtggga ggaagtttg 
29 

<210> 80 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> syntheti-c oligonucleotide 
<400> 80 

atgctgcagg cttaaggctt ctcgccggtg 
30 

<210> 81 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> primer for PCR 

<221> misc_feature 
<222> (0) . . . (0) 

<223> n = A, T, G, or C; y = T or C; s = G or C; r = G 
or A 

<400> 81 

gcgtccggac ncayacnggn sara 
24 

<210> 82 
<211> 24 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> primer for PCR ' w 

<221> misc_feature , 
<222> (0) . . . (0) 

<223> n = A, T, G, or C; b = G, C, or T; r = G or A; w = 
A or T; y = T or C 

<400> 82 

cggaattcan nbrwanggyy tytc 
24 

<210> 83 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> amino acid motif 

<221> VARIANT 
<222> 4 

<223> Xaa = Glu or Gin 

<221> VARIANT 
<222> 5 

<223> Xaa = Lys or Arg 

<221> VARIANT 
<222> 3 

<223> Xaa = Tyr or Phe 

<400> 83 " 
His Thr Gly Xaa Xaa Pro Xaa 
1 5 

<210> 84 
<211> 54 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic oligonucleotide 
<400> 84 

gggcccgggg agaagcctta cgcatgtcca gtcgaatctt gtgatagaag attc 
54 

<210> 85 
<211> 75 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic oligonucleotide 
<221> misc feature 
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<222> (0) . . . (0) 

<223> n = A, T, G, or C; b - G, C, or T; s - G or C 
<400> 85 

ctccccgcgg ttcgccggtg tggattctga tatgsnbsnb aagsnbsnbs nbsnbtgaga 
60 

atcttctatc acaag * 
75 

<210> 86 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic oligonucleotide 
<400> 86 

ctagacccgg gaattcgtcg acg 
23 

<210> 87 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic oligonucleotide 
<400> 87 

gatccgtcga cgaattcccg ggt 
23 

<210> 88 
<211> 38 
<212> DNA 

<213> syArtificial Sequence 
<220> 

<223> synthetic oligonucleotide 

<221> misc__feature 

<222> (0) . . . (0) 

<223> n = A, T, G, or C 

<400> 88 

ccggtnnntg ggcgtacnnn tgggcgtcan nntgggcg 
38 

<210> 89 
<211> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic oligonucleotide 

<221> raisc_f eature 

<222> (0)...(0) 

<223> n = A, T, G, or C 
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<400> 89 

tcgacgccca nnntgacgcc cannngtacg cccannna 
38 

<210> 90 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic probe for gel shift assay 
<400> 90 

ccgggtcgcg cgtgggcggt accg 
24 

<210> 91 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic probe for gel shift assay 
<400> 91 

tcgacggtac cgcccacgcg cgac 
24 

<210> 92 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic probe for gel shift assay 
<400> 92 

ccgggtcgcg agcgggcggt accg 
24 

<210> 93 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic probe for gel shift assay 
<400> 93 

tcgacggtac cgcccgctcg cgac 
24 

<210> 94 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic probe for gel shift assay 
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<400> 94 

ccgggtcgtg cttgggcggt accg 
24 

<210> 95 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic probe for gel shift assay 
<400> 95 

tcgacggtac cgcccaagca cgac 
24 

<210> 96 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic probe for gel shift assay 
<400> 96 

ccgggtcggg actgggcggt accg 
24 

<210> 97 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> syntheti-c probe for gel shift assay 
<400> 97 

tcgacggtac cgcccagtcc cgac 
24 

<210> 98 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic probe for gel shift assay 
<400> 98 

ccgggtcggg agtgggcggt accg 
24 

<210> 99 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic probe for gel shift assay 
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<400> 99 

tcgacggtac cgcccactcc cgac 
24 



<210> 100 
<211> 24 
<212> DNA 
<213> Artificial 



Sequence 



<220> 

<223> synthetic probe for gel shift assay 
<400> 100 

ccgggtcgga catgggcggt accg 
24 



<210> 101 
<2ll> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> synthetic probe for gel shift assay 
<400> 101 

tcgacggtac cgcccatgtc cgac 
24 



<210> 102 
<211> 69 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> CDS - ' 

<222> (1) . . . (69) 

<400> 102 

tat aag tgt aag gaa tgt ggg cag gcc ttt aga cag cgt gca cat ctt 
48 

Tyr Lys Cys Lys Glu Cys Gly Gin Ala Phe Arg Gin Arg Ala His Leu 
1 5 10 15 

att cga cat cac aaa ctt cac 
69 

lie Arg His His Lys Leu His 
20 



<210> 103 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 103 

Tyr Lys Cys Lys Glu Cys Gly Gin Ala Phe Arg Gin Arg Ala His Leu 

15 10 15 

lie Arg His His Lys Leu His 
20 
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<210> 104 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 104 

tat aag tgt cat caa tgt ggg aaa gcc ttt att caa tec ttt aac ctt 
48 

Tyr Lys Cys His Gin Cys Gly Lys Ala Phe lie Gin Ser Phe Asn Leu 
1 5 10 '15 

cga aga cat gag aga act cac 
69 

Arg Arg His Glu Arg Thr His 
20 



<210> 105 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 105 

Tyr Lys Cys His Gin Cys Gly Lys Ala Phe lie Gin Ser Phe Asn Leu 

15 10 15 

Arg Arg His Glu Arg Thr His 
20 

<210> 106 

<211> 69 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 106 

ttc cag tgt aat cag tgt ggg gca tct ttt act cag aaa ggt aac etc 
48 

Phe Gin Cys Asn Gin Cys Gly Ala Ser Phe Thr Gin Lys Gly Asn Leu 
15 10 15 

etc cgc cac att aaa ctg cac 
69 

Leu Arg His lie Lys Leu His 
20 



<210> 107 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 107 
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Phe Gin Cys Asn Gin Cys Gly Ala 

1 5 
Leu Arg His lie Lys Leu His 
20 



Ser Phe Thr Gin Lys Gly Asn Leu 
10 15 



<210> 108 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> primer for PCR 



<221> misc_feature 
<222> (0) . . . (0) 

<223> n =A, T, G, or C; 48-51 nucleotides in length 



<400> 108 

acccacactg gccagaaacc cn 
22 



<210> 109 
<211> 22 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> primer for PCR 



<221> misc_feature 
<222> (0) . . . (0) 

<223> n = A, T, G, or C; 42-45 nucleotides in length 



<400> 109 

gatctgaatt cattcaccgg tn 
22 



<210> 110 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 110 

tac aaa tgt gaa gaa tgt ggc aaa gcc ttt agg cag tec tea cac ctt 
48 

Tyr Lys Cys Glu Glu Cys Gly Lys Ala Phe Arg Gin Ser Ser His Leu 
1 5 10 15 

act aca cat aag ata att cat 
69 

Thr Thr His Lys He He His 
20 



<210> 111 
<211> 23 
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<212> PRT 

<213> Homo sapiens 
<400> 111 

Tyr Lys Cys Glu Glu Cys Gly Lys Ala Phe Arg Gin Ser Sec His Leu 

15 10 15 

Thr Thr His Lys lie lie His ' 
20 

<210> 112 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 112 

tat gag tgt gat cac tgt gga aaa tec ttt age cag age tct cat ctg 
48 

Tyr Glu Cys Asp His Cys Gly Lys Ser Phe Ser Gin Ser Ser His Leu 
15 10 15 

aat gtg cac aaa aga act cac 
69 

Asn Val His Lys Arg Thr His 
20 



<210> 113 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 113 

Tyr Glu Cys Asp His Cys Gly Lys Ser Phe Ser Gin Ser Ser His Leu 

15 10 15 

Asn Val His Lys Arg Thr His 

20 • 

<210> 114 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . - (69) 

<400> 114 

tac atg tgc agt gag tgt ggg cga ggc ttc age cag aag tea aac etc 
48 

Tyr Met Cys Ser Glu Cys Gly Arg Gly Phe Ser Gin Lys Ser Asn Leu 
15 10 15 

ate ata cac cag agg aca cac 
69 

He He His Gin Arg Thr His 
20 
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<210> 115 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 115 

Tyr Met Cys Ser Glu Cys Gly Arg Gly Phe Ser Gin Lys Ser Asn Leu 

15 10 15 

He He His Gin Arg Thr His 
20 

<210> 116 
<211>- 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 116 

tat gaa tgt gaa aaa tgt ggc aaa get ttt aac cag tec tea aat ctt 
48 

Tyr Glu Cys Glu Lys Cys Gly Lys Ala Phe Asn Gin Ser Ser Asn Leu 
15 10 15 

act aga cat aag aaa agt cat 
69 

Thr Arg His Lys Lys Ser His 
20 



<210> 117 

<211> 23 " 
<212> PRT 

<213> Homo sapiens 
<400> 117 

Tyr Glu Cys Glu Lys Cys Gly Lys Ala Phe Asn Gin Ser Ser Asn Leu 

15 10 15 

Thr Arg His Lys Lys Ser His 
20 

<210> 118 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 118 

tat gag tgc aat gaa tgt ggg aag ttt ttt age cag age tec age etc 
48 

Tyr Glu Cys Asn Glu Cys Gly Lys Phe Phe Ser Gin Ser Ser Ser Leu 
1 5 10 15 
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att aga cat agg aga agt cac 
lie Arg His Arg Arg Ser His 
20 



69 



<210> 119 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 119 

Tyr Glu Cys Asn Glu Cys Gly Lys Phe Phe Ser Gin Ser Ser Ser Leu 

15 10 15 

lie Arg His Arg Arg Ser His 
20 

<210> 120 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . <S9) 

<400> 120 

tat gag tgt cac gat tgc gga aag tec ttt agg cag age acc cac etc 
48 

Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gin Ser Thr His Leu 
15 10 15 

act cag cac egg agg ate cac 
69 

Thr Gin His Arg Arg He His 
20 



<210> 121 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 121 

Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gin Ser Thr His Leu 

1 5 10 15 

Thr Gin His Arg Arg He His 
20 

<210> 122 
<211> 69 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 122 

tat gag tgt cac gat tgc gga aag tec ttt agg cag age acc cac etc 
48 
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Tyr Glu Cys 
1 



His Asp Cys Gly Lys 
5 



Ser Phe Arg Gin Ser Thr His Leu 
10 15 



act egg cac 
69 

Thr Arg His 



Arg Arg lie His 
20 



egg agg ate cac 



<210> 123 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 123 

Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gin Ser Thr His Leu 

15 10 15 

Thr Arg His Arg Arg lie His 
20 

<210> 124 

<211> 69 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 124 

cac aag tgc ctt gaa tgt ggg aaa tgc ttc agt cag aac ace cat ctg 
48 

His Lys Cys Leu Glu Cys Gly Lys Cys Phe Ser Gin Asn Thr His Leu 
15 10 15 

act cgc cac caa cgc acc cac 
69 

Thr Arg His Gin Arg Thr His 



<210> 125 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 125 

His Lys Cys Leu Glu Cys Gly Lys Cys Phe Ser Gin Asn Thr His Leu 

1 5 10 15 

Thr Arg His Gin Arg Thr His 



<210> 126 

<211> 75 

<212> DNA 

<213> Homo sapiens 



20 



<220> 
<221> CDS 
<222> (1) . 



(75) 
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<400> 126 

tac cac tgt gac tgg gac ggc tgt gga tgg aaa ttc gcc cgc tea gat 
48 

Tyr His Cys Asp Trp Asp Gly Cys Gly Trp Lys Phe Ala Arg Ser Asp 
15 10 15 

gaa ctg acc agg cac tac cgt aaa cac 
75 

Glu Leu Thr Arg His Tyr Arg Lys His 

20 25 



<210> 127 
<211> 25 
<212> PRT 

<213> Homo sapiens 
<400> 127 

Tyr His Cys Asp Trp Asp Gly Cys Gly Trp Lys Phe Ala Arg Ser Asp 

15 10 15 

Glu Leu Thr Arg His Tyr Arg Lys His 
20 25 

<210> 128 
<211> 75 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (75) 

<400> 128 

tac aga tgc tea tgg gaa ggg tgt gag tgg cgt ttt gca aga agt gat 
48 

Tyr Arg Cys Ser Trp Glu Gly Cys Glu Trp Arg Phe Ala Arg Ser Asp 
15 10 15 

gag tta acc agg cac ttc cga aag cac 
75 

Glu Leu Thr Arg His Phe Arg Lys His 

20 25 



<210> 129 
<211> 25 
<212> PRT 

<213> Homo sapiens 
<400> 129 

Tyr Arg Cys Ser Trp Glu Gly Cys Glu Trp Arg Phe Ala Arg Ser Asp 

15 10 15 

Glu Leu Thr Arg His Phe Arg Lys His 
20 25 

<210> 130 

<211> 75 

<212> DNA 

<213> Homo sapiens 
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<220> 

<221> CDS 

<222> (1) . . . (75) 

<400> 130 - 

ttc age tgt age tgg aaa ggt tgt gaa agg agg ttt gco cgt tct gat 
48 

Phe Ser Cys Ser Trp Lys Gly Cys Glu Arg Arg Phe Ala Arg Ser Asp 
15 10 15 

gaa ctg tec aga cac agg cga ace cac 
75 

Glu Leu Ser Arg His Arg Arg Thr His 
20 25 



<210> 131 
<211> 25 
<212> PRT 

<213> Homo sapiens 
<400> 131 

Phe Ser Cys Ser Trp Lys Gly Cys Glu Arg Arg Phe Ala Arg Ser Asp 

15 10 15 

Glu Leu Ser Arg His Arg Arg Thr His 
20 25 

<210> 132 

<211> 75 

<212> DNA 

<213> Homo sapiens 

<220> 
<221> CDS 

<222> (1) . . . (75) ' " ' 

<400> 132 

ttc gee tgc age tgg cag gac tgc aac aag aag ttc gcg cgc tec gac 
48 — . 

Phe Ala Cys Ser Trp Gin Asp Cys Asn Lys Lys Phe Ala Arg Ser Asp 
15 10 15 

gag ctg gcg egg cac tac cgc aca cac 
75 

Glu Leu Ala Arg His Tyr Arg Thr His 
20 25 



<210> 133 
<211> 25 
<212> PRT 

<213> Homo sapiens 
<400> 133 

Phe Ala Cys Ser Trp Gin Asp Cys 

1 5 
Glu Leu Ala Arg His Tyr Arg Thr 
20 



Asn Lys Lys Phe Ala Arg Ser Asp 
10 15 

His 
25 
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<210> 134 
<211> 75 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (75) 

<400> 134 

tac cac tgc aac tgg gac ggc tgc ggc tgg aag ttt gcg cgc tea gac 
48 

Tyr His Cys Asn Trp Asp Gly Cys Gly Trp Lys Phe Ala Arg Ser Asp 
15 10 15 

gag etc acg cgc cac tac cga aag cac 
75 

Glu Leu Thr Arg His Tyr Arg Lys His 

20 25 



<210> 135 
<211> 25 
<212> PRT 

<213> Homo sapiens 

<400> 135 ' 

Tyr His Cys Asn Trp Asp Gly Cys Gly Trp Lys Phe Ala Arg Ser Asp 

15 10 15 

Glu Leu Thr Arg His Tyr Arg Lys His 
20 25 

<210> 136 
<211> 72 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (72) 

<400> 136 

ttc etc tgt cag tat tgt gca cag aga ttt ggg cga aag gat cac ctg 
48 

Phe Leu Cys Gin Tyr Cys Ala Gin Arg Phe Gly Arg Lys Asp His Leu 
15 10 15 

act cga cat atg aag aag agt cac 
72 

Thr Arg His Met Lys Lys Ser His 
20 



<210> 137 
<211> 24 
<212> PRT 

<213> Homo sapiens 
<400> 137 

Phe Leu Cys Gin Tyr Cys Ala Gin Arg Phe Gly Arg Lys Asp His Leu 
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15 10 15 

Thr Arg His Met Lys Lys Ser His 
20 

<210> 138 
<211> 78 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> primer for PCR 
<400> 138 

tgtcgaatct gcatgcgtaa cttcagtcgt agtgaccacc ttaccaccca catccggacc 
60 

cacactggcc agaaaccc 
78 

<210> 139 
<211> 81 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> primer for PCR 
<400> 139 

ggtggcggcc gttacttact tagagctcga cgtcttactt acttagcggc cgcactagta 
60 

gatctgaatt cattcaccgg t 
81 

<210> 140 

<211> 69 

<212> DNA 

<213> Homo sapiens 

<220> 
<221> CDS 

<222> (1) . . . (69) * " . 

<400> 140 

ttc cag tgt aaa act tgt cag cga aag ttc tec egg tec gac cac ctg 
48 

Phe Gin Cys Lys Thr Cys Gin Arg Lys Phe Ser Arg Ser Asp His Leu 
15 10 15 

aag acc cac acc agg act cat 
69 

Lys Thr His Thr Arg Thr His 
20 



<210> 141 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 141 

Phe Gin Cys Lys Thr Cys Gin Arg Lys Phe Ser Arg Ser Asp His Leu 
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Lys Thr His Thr Arg Thr His 
20 

<210> 142 

<2U> 69 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 142 

ttt gcc tgc gag gtc tgc ggt gtt 
48 

Phe Ala Cys Glu Val Cys Gly Val 
1 5 

aag ate cac atg egg aag cac 
69 

Lys lie His Met Arg Lys His 
20 



41 

10 15 



cga ttc ace agg aac gac aag ctg 

Arg Phe Thr Arg Asn Asp Lys Leu 
10 15 



<210> 143 
<211> 23 
<212> PRT 

<213> Homo sapiens 

<400> 143 ! 

Phe Ala Cys Glu Val Cys Gly Val Arg Phe Thr Arg Asn Asp Lys Leu 

15 10 15 

Lys He His Met Arg Lys His 
20 

<210> 144 

<211> 75 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (75) 

<400> 144 

tat gta tgc gat gta gag gga tgt acg tgg aaa ttt gcc cgc tea gat 
48 

Tyr Val Cys Asp Val Glu Gly Cys Thr Trp Lys Phe Ala Arg Ser Asp 
15 10 15 

aag etc aac aga cac aag aaa agg cac 
75 

Lys Leu Asn Arg His Lys Lys Arg His 
20 25 



<210> 145 
<211> 25 
<212> PRT 
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<213> Homo sapiens 
<400> 145 

Tyr Val Cys Asp Val Glu Gly Cys Thr Trp Lys Phe Ala Arg Ser Asp 

1 5 10 . 15 

Lys Leu Asn Arg His Lys Lys Arg His 
20 25 

<210> 146 

<211> 69 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 146 

tat att tgc aga aag tgt gga egg ggc ttt agt egg aag tec aac ctt 
48 

Tyr lie Cys Arg Lys Cys Gly Arg Gly Phe Ser Arg Lys Ser Asn Leu 
15 10 15 

ate aga cat cag agg aca cac 
69 

lie Arg His Gin Arg Thr His 
20 



<210> 147 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 147 

Tyr He Cys Arg Lys Cys Gly Arg Gly Phe Ser Arg Lys Ser Asn Leu 

1 5 10 15 

He Arg His Gin Arg Thr His 
20 

<210> 148 

<211> 69 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (1) . . . (69) 

<400> 148 

tat eta tgt agt gag tgt gac aaa tgc ttc agt aga agt aca aac etc 
48 

Tyr Leu Cys Ser Glu Cys Asp Lys Cys Phe Ser Arg Ser Thr Asn Leu 
1 5 10 15 

ata agg cat cga aga act cac 
69 

He Arg His Arg Arg Thr His 
20 
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<210> 149 
<211> 23 
<212> PRT 

<213> Homo sapiens 
<400> 149 

Tyr Leu Cys Ser Glu Cys Asp Lys Cys Phe Ser Arg Ser Thr Asn Leu 

15 10 15 

lie Arg His Arg Arg Thr His 
20 

<210> 150 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 150 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Gin Xaa Ala His Xaa Xaa 

1 5 ' 10 15 

Arg His Xaa His 
20 

<210> 151 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<2 23> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 
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<223> Xaa = any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa ~ any amino acid; 2-5 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 
<400> 151 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa 

1 5 
Arg His Xaa His 
20 

<210> 152 
<211> 20 
<212> PRT 

<2i3> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa — any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 
<400> 152 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa 

1 5 
Thr His Xaa His 
20 

<210> 153 
<211> 20 
<212> PRT 

<213> Artificial Sequence 



44 



amino acids in length 



amino acids in length 



Xaa Xaa Gin Xaa Phe Asn Xaa Xaa 
10 15 



amino acids in length 



amino acids in length 



Xaa Xaa Gin Xaa Ser His Xaa Xaa 
10 15 



<220> 



1 
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<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa - Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid/ 3-5 amino acids in length 
<400> 153 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Gin Xaa Ser His Xaa Xaa 

1 5 10 15 

Val His Xaa His 
20 

<210> 154 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

^221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 154 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Gin Xaa Ser Asn Xaa Xaa 

15 10 15 

He His Xaa His 
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20 

<210> 155 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> * 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, i6 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> CONFLICT 
<222> 15 

<223> Xaa - hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 155 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Gin Xaa Ser Asn Xaa Xaa 

15 10 15 

Arg His Xaa His 
20 

<210> 156 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 
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<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 156 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Gin Xaa Thr His Xaa Xaa 

15 10 15 

Gin His Xaa His 
20 



<210> 157 
<211> 18 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 2 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 

<222> 4-6, 8, 10, 14 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 7 

<223> Xaa = Phe or Tyr 



<221> VARIANT 
<222> 13 

<223> Xaa = hydrophobic residue 



<221> VARIANT 
<222> 17 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 157 

Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Gin Xaa Thr His Xaa Xaa Arg His 

15 10 15 

Xaa His 



<210> 158 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 
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<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 158 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Arg Xaa Asp Lys Xaa Xaa 

15 10 15 

lie His Xaa His 
20 

<210> 159 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa « any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa - any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 159 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Arg Xaa Ser Asn Xaa Xaa 

15 10 15 

Arg His Xaa His 
20 

<210> 160 
<211> 20 
<212> PRT 

<213> Artificial Sequence 



<220> 

<223> purified polypeptide 
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<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 160 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Thr Xaa Gly Asn Xaa Xaa 

15 10 15 

Arg His Xaa His 
20 

<210> 161 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<22 3> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa « Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa - any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 161 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Gin Xaa Gly Asn Xaa Xaa 

15 10 15 

Arg His Xaa His 
20 
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<210> 162 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa - any amino acid; 3-5 amino acids in length 
<400> 162 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Arg Xaa Asp Glu Xaa Xaa 

15 10 15 

Arg His Xaa His 
20 

<210> 163 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa » Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 
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<223> Xaa = 



any amino acid; 3-5 



amino .acids in length 



<400> 163 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Arg Xaa Asp Hi.s Xaa Xaa 



Arg His Xaa His 
20 

<210> 164 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa « hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 164 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Arg Xaa Asp His Xaa Xaa 

15 10 15 

Thr His Xaa His 
20 

<210> 165 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa = any amino acid 



1 



5 



10 



15 



<221> VARIANT 
<222> 4 
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<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 165 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Arg Xaa Asp Lys Xaa Xaa 

1 5 10 15 

Arg His Xaa His 
20 

<210> 166 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 

<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2 f 6-8, 10, 12, 16 

<223> Xaa - any amino acid 

<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 

<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 

<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 166 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Arg Xaa Ser His Xaa Xaa 

15 10 15 

Arg His Xaa His 
20 

<210> 167 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> purified polypeptide 

<221> VARIANT 
<222> 1, 9 
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<223> Xaa = Phe or Tyr 

<221> VARIANT 

<222> 2, 6-8, 10, 12, 16 

<223> Xaa - any amino acid 



<221> VARIANT 
<222> 4 

<223> Xaa = any amino acid; 2-5 amino acids in length 



<221> VARIANT 
<222> 15 

<223> Xaa = hydrophobic residue 



<221> VARIANT 
<222> 19 

<223> Xaa = any amino acid; 3-5 amino acids in length 
<400> 167 

Xaa Xaa Cys Xaa Cys Xaa Xaa Xaa Xaa Xaa Arg Xaa Thr Asn Xaa Xaa 

15 10 15 

Arg His Xaa His 
20 
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